PolyServe® Matrix Server Administration Guide PolyServe® Matrix Server 3.
Copyright © 2004-2006 PolyServe, Inc. Use, reproduction and distribution of this document and the software it describes are subject to the terms of the software license agreement distributed with the product (“License Agreement”). Any use, reproduction, or distribution of this document or the described software not explicitly permitted pursuant to the License Agreement is strictly prohibited unless prior written permission from PolyServe has been received.
Contents 1 Introduction Product Features. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 The Structure of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Software Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Shared SAN Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Contents iv Virtual Hosts Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Applications Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Filesystems Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Notifiers Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Matrix Alerts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Contents v Administrative Network Failover . . . . . . . . . . . . . . . . . . . . . . . . . Making Network Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Add or Modify a Network Interface . . . . . . . . . . . . . . . . . . . . . . . . . . Remove a Network Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Allow or Discourage Administrative Traffic . . . . . . . . . . . . . . . . . . . Enable or Disable a Network Interface for Virtual Hosting. . . . . . .
Contents vi Create a Dynamic Volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dynamic Volume Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Extend a Dynamic Volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Destroy a Dynamic Volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Recreate a Dynamic Volume. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Contents vii Resize a Filesystem Manually. . . . . . . . . . . . . . . . . . . . . . . . . . . . Destroy a Filesystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Recover an Evicted Filesystem . . . . . . . . . . . . . . . . . . . . . . . . . . . Context Dependent Symbolic Links . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Matrix-Wide File Locking . . . . . . . . . . . . . . . .
Contents viii Filter the Applications Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using the Applications Tab. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . “Drag and Drop” Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . Menu Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 142 142 145 11 Configure Virtual Hosts Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Contents ix SHARED_FILESYSTEM Device Monitor . . . . . . . . . . . . . . . . . . Disk Device Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Custom Device Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Device Monitors and Failover . . . . . . . . . . . . . . . . . . . . . . . . . . . Device Monitor Activeness Policy . . . . . . . . . . . . . . . . . . . . . . . . Add or Modify a Device Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . .
Contents x Validate Load-Balancing When a Server Is Down. . . . . . . . . . . 200 Test LAN Failover of Administrative Matrix Traffic. . . . . . . . . . . . 201 16 Performance Monitoring View the Performance Dashboard . . . . . . . . . . . . . . . . . . . . . . . . . . . Display the Dashboard for All Servers . . . . . . . . . . . . . . . . . . . . View Counter Details. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Display the Dashboard for One Server . . . . . . . . . . . . . . . . .
Contents xi 19 Other Matrix Maintenance Maintain Log Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The matrix.log File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Matrix Alerts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Collect Log Files with mxcollect . . . . . . . . . . . . . . . . . . . . . . . . . . . . Create an FTP Account . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Contents xii Network Interface Requirements Are Not Met . . . . . . . . . . . . . mxinit Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fence Agent Error Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Operational Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Loss of Network Access or Unresponsive Switch . . . . . . . . . . . Default VSAN Is Disabled on Cisco MDS FC Switch . . . . . . . .
1 Introduction Matrix Server provides a matrix structure for managing a group of network servers and a Storage Area Network (SAN) as a single entity. Product Features Matrix Server provides the following features: • Fully distributed data-sharing environment. The PSFS filesystem enables all servers in the matrix to directly access shared data stored on a SAN.
Chapter 1: Introduction 2 • Matrix-wide administration. The Management Console (a Java-based graphical user interface) and the corresponding command-line interface enable you to configure and manage the entire matrix either remotely or from any server in the matrix. • Failover support for network applications. Matrix Server uses virtual hosts to provide highly available client access to mission-critical data for Web, e-mail, file transfer, and other TCP/IP-based applications.
Chapter 1: Introduction 3 Overview The Structure of a Matrix A matrix includes the following physical components. Internet Public LANs Administrative Network (LAN) FC Switch RAID Subsystem RAID Subsystem Servers. Each server must be running Matrix Server. Public LANs. A matrix can include up to four network interfaces per server.
Chapter 1: Introduction 4 networks be isolated from the networks used by external clients to access the matrix. Storage Area Network (SAN). The SAN includes FibreChannel switches and RAID subsystems. Disks in a RAID subsystem are imported into the matrix and managed from there. After a disk is imported, you can create PSFS filesystems on it. Software Components The Matrix Server software is installed on each server in the matrix and includes the following major components.
Chapter 1: Introduction 5 Management Console. Provides a graphical interface for configuring a Matrix Server matrix and monitoring its operation. The console can be run either remotely or from any server in the matrix. Administrative Network. Handles Matrix Server administrative traffic. Most Matrix Server daemons communicate with each other over the administrative network. ClusterPulse daemon.
Chapter 1: Introduction 6 PanPulse daemon. Selects and monitors the network to be used for the administrative network, verifies that all hosts in the matrix can communicate with each other, and detects any communication problems. mxinit daemon. Starts or stops Matrix Server and monitors Matrix Server processes. mxlogd daemon. Manages global error and event messages. The messages are written to the /var/log/polyserve/matrix.log file on each server. mxlog module.
Chapter 1: Introduction 7 The PSFS filesystem provides the following features: • Concurrent access by multiple servers. After a filesystem has been created on a shared disk, all servers having physical access to the device via the SAN can mount the filesystem. A PSFS filesystem must be consistently mounted either read-only or read-write across the matrix. • Support for standard filesystem operations such as mkfs, mount, and umount.
Chapter 1: Introduction 8 • Volume database. This database stores information about dynamic volumes and is located on the membership partitions. Virtual Hosts and Failover Protection Matrix Server uses virtual hosts to provide failover protection for servers and network applications. A virtual host is a hostname/IP address configured on one or more servers. The network interfaces selected on those servers to participate in the virtual host must be on the same subnet.
Chapter 1: Introduction 9 Matrix Server includes several built-in service monitors for monitoring well-known network services. You can also configure custom monitors for other services. A device monitor is similar to a service monitor; however, it is designed either to watch a part of a server such as a local disk drive or to monitor a PSFS filesystem. A device monitor is assigned to one or more servers. Matrix Server provides several built-in device monitors.
Chapter 1: Introduction 10 are associated with the four servers in the matrix. Each server is primary for one of the virtual hosts and backup for the other virtual hosts. Client requests to www.xvz.com Public network 99.10.20 Administrative network 99.120.0 svr1 Primary for virtual host xvz1 svr2 svr3 Primary for virtual host xvz2 svr4 Primary for virtual host xvz3 Primary for virtual host xvz4 FC switch /www filesystem /httpd-logs filesystem Two PSFS filesystems are mounted on each server.
Chapter 1: Introduction 11 • An HTTP service monitor is configured on each server. If this monitor detects that the HTTP service has failed, the associated virtual host will fail over to a backup server where the HTTP service is healthy. The client requests will then be processed on that server. • Two SHARED_FILESYSTEM device monitors are configured on each server for the PSFS filesystems (/www and /httpd-logs). These monitors check the mount status and health of the mounted PSFS filesystems.
Chapter 1: Introduction 12 server, with a ratio of 8), paging can increase on the smallest server to the extent that overall matrix performance is significantly reduced. Multipath I/O Multipath I/O can be used in a matrix configuration to eliminate single points of failure. It supports the following: • Up to four FibreChannel ports per server. If an FC port or its connection to the fabric should fail, the server can use another FC port to reach the fabric. • Multiple FibreChannel switches.
Chapter 1: Introduction 13 Single FC Port, Single FC Switch, Single FC Fabric This is the simplest configuration. Each server has a single FC port connected to an FC switch managed by the matrix. The SAN includes two RAID arrays. In this configuration, multiported SAN disks can protect against a port failure, but not a switch failure.
Chapter 1: Introduction 14 Single FC Port, Dual FC Switches, Single FC Fabric In this example, the fabric includes two FC switches managed by the matrix. Servers 1–3 are connected to the first FC switch; servers 4–6 are connected to the second switch. The matrix also includes two RAID arrays, which contain multiported disks. If a managed FC switch fails, the servers connected to the other switch will survive and access to storage will be maintained.
Chapter 1: Introduction 15 Dual FC Ports, Dual FC Switches, Single FC Fabric This example uses multipath I/O to eliminate single points of failure. The fabric includes two FC switches managed by the matrix. Each server has two FC ports; the first FC port connects to the first FC switch and the second FC port connects to the second FC switch. The matrix also includes two RAID arrays containing multiported disks.
Chapter 1: Introduction 16 Dual FC Ports, Dual FC Switches, Dual FC Fabrics This example is similar to the previous example, but also includes dual FC fabrics, with a matrix-managed FC switch in each fabric. If one of the fabrics should fail, the servers can access the storage via the other fabric. IP Connection/Network Server 1 Administrative Network Server 2 Server 3 Server 4 FC Switch FC Switch FC Fabric FC Fabric RA ID Array Server 5 RA ID Array Copyright © 1999-2006 PolyServe, Inc.
Chapter 1: Introduction 17 PolyServe Technical Support PolyServe Technical Support provides both technical assistance and product information and downloads. Technical Assistance If you have a technical issue with Matrix Server or a PolyServe Solution Pack, you can contact PolyServe Technical Support for assistance. For more information about PolyServe Technical Support, go to the PolyServe web site. http://www.polyserve.com/support.
Chapter 1: Introduction When you establish a MatrixLink account, you will have access to the following: • Product downloads and the appropriate license keys. • Product documentation. • Information about known issues with PolyServe products. • The Technical Support Knowledge Base, which contains articles regarding configuring, operating, and troubleshooting PolyServe products. You will also receive product update notifications. Copyright © 1999-2006 PolyServe, Inc. All rights reserved.
2 Matrix Administration PolyServe Matrix Server can be administered either with the Management Console or from the command line. Administrative Considerations You should be aware of the following when managing Matrix Server: • Normal operation of the matrix depends on a reliable network hostname resolution service. If the hostname lookup facility becomes unreliable, this can cause reliability problems for the running matrix.
Chapter 2: Matrix Administration 20 • For best performance, we recommend that you monitor the matrix from a separate administrative station rather than from a server in the matrix. The Management Console can be installed on Linux or Windows systems. • Matrix Server components can fail if the /var filesystem fills up. To avoid this situation, move /var/opt/polyserve to a separate partition containing at least 200 MB.
Chapter 2: Matrix Administration 21 partitioning SCSI disks, be sure that partition 15 is the highestnumbered partition. • Matrix Server supports 2 TB filesystems on basic volumes and up to 16 TB filesystems on dynamic volumes. • If servers from multiple matrices can access the SAN via a shared FC fabric, avoid importing the same disk into more than one matrix. Filesystem corruption can occur when different matrices attempt to share the same filesystem.
Chapter 2: Matrix Administration 22 • 64 service and/or device monitors per matrix (the total number of service and device monitors cannot exceed 64) • 10 event notifiers per matrix • 4 network interface cards per server • 4 FibreChannel ports per server • 1024 unique paths to disks in the matrix These limits will be increased as additional testing takes place. Theoretically, the tested configuration limits can be exceeded up to the bounds of the operating system.
Chapter 2: Matrix Administration 23 NOTE: For improved performance, the Management Console caches hostname lookups. If your DNS changes, you may need to restart the console so that it will reflect the new hostname. Start the Management Console To start the Management Console, first start the windowing environment and then type the following command: $ mxconsole The Login window then appears. If the window does not display properly, verify that your DISPLAY variable is set correctly.
Chapter 2: Matrix Administration 24 Manage a Matrix from the Command Line The mx utility allows you to manage Matrix Server from the command line. You will need to create a .matrixrc file to use the mx commands. See the Matrix Server Command Reference for more information about mx and the .matrixrc file. PSFS filesystems can also be managed with Linux shell commands. Changes made with these commands are reflected on the Management Console.
Chapter 2: Matrix Administration 25 The toolbar at the top of the window can be used to connect or disconnect from a matrix, to add new matrix entities (servers, virtual hosts, notifiers, device monitors, service monitors, and filesystems), to mount or unmount filesystems, to import or deport disks, to collapse or expand the entity lists, and to display online help. Copyright © 1999-2006 PolyServe, Inc. All rights reserved.
Chapter 2: Matrix Administration 26 Virtual Hosts Tab The Virtual Hosts tab shows all virtual hosts in the matrix. For each virtual host, the window lists the network interfaces on which the virtual host is configured, any service monitors configured on that virtual host, and any device monitors associated with that virtual host. Copyright © 1999-2006 PolyServe, Inc. All rights reserved.
Chapter 2: Matrix Administration 27 Applications Tab This view shows the Matrix Server applications, virtual hosts, service monitors, and device monitors configured in the cluster and provides the ability to manage and monitor them from a single screen. The applications, virtual hosts, and monitors appear in the rows of the table. The servers on which they are configured appear in the columns. Copyright © 1999-2006 PolyServe, Inc. All rights reserved.
Chapter 2: Matrix Administration Filesystems Tab The Filesystems tab shows all PSFS filesystems in the matrix. Copyright © 1999-2006 PolyServe, Inc. All rights reserved.
Chapter 2: Matrix Administration Notifiers Tab The Notifiers tab shows all notifiers configured in the matrix. Copyright © 1999-2006 PolyServe, Inc. All rights reserved.
Chapter 2: Matrix Administration 30 Matrix Alerts The Alerts section at the bottom of the Management Console window lists errors that have occurred in matrix operations. You can double click on an alert message to see more information about the alert: • If the complete alert message does not fit in the Description column, double-click on that column to open a window displaying all of the text. • Double-click in the Location column to display the error on the Management Console.
Chapter 2: Matrix Administration 31 Assign or Change Passwords If you need to assign a new password or to change an existing password on a particular server, use one of these methods: Matrix Configuration window. Select File > Configure. You can then change the admin password on the General Settings tab. If the matrix is running, you will need to change the password individually on each server.
Chapter 2: Matrix Administration mxinit The mxinit daemon pswebsvr The embedded web server daemon used by the Management Console and the mx utility 32 Do not terminate any of these processes; they are required for Matrix Server operations. Process Monitoring The mxinit utility is started automatically as a daemon on each server and monitors all Matrix Server processes running there. (You can start another instance of mxinit to perform other tasks provided by the utility.
Chapter 2: Matrix Administration psfs dlm sanpulse 33 Loaded 15913 15917 FibreChannel adapter module status: qla2300 - QLogic 2300 FibreChannel Adapter, is Loaded The PID is displayed for running processes; “Stopped” is displayed for processes that are not running. For modules, the status specifies whether the module is loaded. “FibreChannel adapter module status” displays status for the FibreChannel adapter modules installed on the system.
Chapter 2: Matrix Administration 34 Start or Stop Matrix Server with mxinit Typically, you should use the pmxs script to start or stop Matrix Server. However, if you want to see verbose output during the start or stop operation, you can run mxinit manually with the --verbose option. You can use the following mxinit options to start or stop Matrix Server: • -s, --start Start the Matrix Server processes. • -x, --stop Gently stop the Matrix Server processes. mxinit first attempts to unmount PSFS filesystems.
Chapter 2: Matrix Administration 35 • -M, --no-monitor Explicitly tell mxinit not to monitor processes. • --hba-status Display the state of the FibreChannel host bus adapter drivers. Administer init.d Scripts When services such as NFS or Samba are configured on PSFS filesystems, Matrix Server must be started before the service is started. To ensure that Matrix Server starts before the service, modify the /etc/init.d script for the desired service and add pmxs as a dependency. In the init.
Chapter 2: Matrix Administration 36 The Matrix Server configuration files are in the /etc/opt/polyserve and /var/opt/polyserve directories. You can back up either the entire directories or the following individual files. /etc/opt/polyserve /var/opt/polyserve • cp_conf • mxsecret • FCswitches • fc_pcitable • oem.conf • MPdata • fence.conf • psSan.cfg • run/MP.backup • licenses/license • scl.conf • mxinit.conf • snap.
Chapter 2: Matrix Administration 37 Quotas Information for PSFS Filesystems The Linux utilities or third-party applications used to back up PSFS filesystems are not aware of the Matrix Server quotas feature and cannot back up the quotas information. Instead, when you back up a filesystem, also use the Matrix Server psfsdq command to back up the quota information stored on the filesystem. After restoring the filesystem, run the psfrq command to restore the quota information.
Chapter 2: Matrix Administration mxmpio(8) Monitor or manage MPIO devices mxnlmconfig(8) Enable or disable NLM locking (provided with MxFS-Linux) mxsanlock(8) Display status of SAN ownership locks netif(8) mx command to manipulate network interfaces notifier(8) mx command to manipulate notifiers psfsck(8) Check a PSFS filesystem psfsdq(8), psfsrq(8) Back up or restore quota data for a PSFS filesystem psfssema(8) Manage command-line semaphores psfsinfo(8) Report PSFS filesystem information
3 Configure Servers Before adding a server to a matrix, verify the following: • The server is connected to the SAN if it will be accessing PSFS filesystems. • The server is configured as a fully networked host supporting the services to be monitored. For example, if you want Matrix Server to provide failover protection for your Web service, the appropriate Web server software must be installed and configured on the servers. • If the /etc/hosts file has been modified, it should be consistent with the DNS.
Chapter 3: Configure Servers 40 Depending on your fencing method, you might also need to specify the hostnames of the FC switches that are directly connected to the servers in the matrix. If you are using Web Management-based fencing, you may be asked for additional information about the server.
Chapter 3: Configure Servers 41 Server: Enter the name or IP address of the server. Server Severity: When a server fails completely because of a power failure or other serious event, Matrix Server attempts to move any virtual hosts from the network interfaces on the failed server to backup network interfaces on healthy servers in the matrix.
Chapter 3: Configure Servers 42 NOTE: For improved performance, the Management Console caches hostname lookups. If your DNS changes, you may need to restart the console so that it will reflect the new hostname. To add or update a server from the command line, use this command: mx server add|update [--serverSeverity=autorecover| noautorecover] ...
Chapter 3: Configure Servers 43 To disable servers from the command line, use this command: mx server disable ... Enable a Server Select the server to be enabled from the Servers window on the Management Console, right-click, and select Enable. To enable servers from the command line, use this command: mx server enable ... Change the IP Address for a Server A server’s IP address can be changed without affecting the other servers in the matrix.
Chapter 3: Configure Servers 44 /etc/opt/polyserve/licenses/license on each server or you can install it from the Management Console. By default, Matrix Server reads the license file upon startup of the server and at 15-minute intervals. To cause Matrix Server to read the new license file immediately, use one of these options: • On the Management Console, select the server, right-click, and then select Refresh License.
Chapter 3: Configure Servers 45 Migrate Existing Servers to Matrix Server In Matrix Server, the names of your servers should be different from the names of the virtual hosts they support. A virtual host can then respond regardless of the state of any one of the servers. In some cases, the name of an existing server may have been published as a network host before Matrix Server was configured.
Chapter 3: Configure Servers 46 but those requests are not protected by Matrix Server. If the server fails, requests to the server’s hostname fail, whereas requests to the new virtual hostname are automatically redirected by Matrix Server to a backup server. Configure Servers for DNS Load Balancing Matrix Server can provide failover protection for servers configured to provide domain name service (DNS) load balancing using BIND 4.9 or later.
Chapter 3: Configure Servers 47 acmd1 acmd2 Primary: virtual_acmd1 Primary: virtual_acmd2 Backup: virtual_acmd2 Backup: virtual_amcd1 All virtual host traffic The addresses on the name server are virtual_acmd1 and virtual_acmd2. Two virtual hosts have also been created with those names. The first virtual host uses acmd1 as the primary server and acmd2 as the backup. The second virtual host uses acmd2 as the primary and acmd1 as the backup.
Chapter 3: Configure Servers 48 IP address: The IP addresses for the virtual hosts you will use for each server in the matrix. These are the IP addresses that the DNS will use to send alternate requests. (In this example, virtual host virtual_acmd1 uses IP address 10.1.1.1 and virtual host virtual_acmd2 uses IP address 10.1.1.2.) With this setup, the domain name server sends messages in a roundrobin fashion to the two virtual hosts indicated by the IP addresses, causing them to share the request load.
4 Configure Network Interfaces When you add a server to the matrix, PolyServe Matrix Server determines whether each network interface on that server meets the following conditions: • The network interface is up and running. • Broadcast and multicast are enabled on the network interface. • Each network interface card (NIC) is on a separate network. Network interfaces meeting these conditions are automatically configured into the matrix.
Chapter 4: Configure Network Interfaces 50 performance reasons, we recommend that these networks be isolated from the networks used by external clients to access the matrix. When Matrix Server is started, the PanPulse daemon selects the administrative network from the available networks. When a new server joins the matrix, the PanPulse daemon on that server tries to use the established administrative network.
Chapter 4: Configure Network Interfaces 51 will fail over administrative traffic to the other interface. As a last resort, Matrix Server will fail over the traffic to an interface that discourages administrative traffic. Virtual Hosts A virtual host is created on a set of network interfaces. These network interfaces must be enabled for virtual hosting. By default, all network interfaces are enabled; however, you can disable a network interface if you do not want it to carry virtual host traffic.
Chapter 4: Configure Network Interfaces 52 Administrative Network Failover An administrative network failure occurs when the interface on a particular server is no longer receiving Matrix Server administrative traffic. Some possible causes of the failure are a bad cable or network interface card (NIC). When the administrative network fails on a server, the PanPulse daemon on that server attempts to select another network to act as the administrative network.
Chapter 4: Configure Network Interfaces 53 be changed, stop Matrix Server on the affected server, shut down the network, and then make the change. When the change is complete, bring up the network and restart Matrix Server. Add or Modify a Network Interface When you add a server to the matrix, its network interfaces are automatically configured into the matrix. Occasionally, you may want to preconfigure a network interface.
Chapter 4: Configure Network Interfaces 54 NOTE: The definition for a network interface cannot be modified while Matrix Server is running. If any part of a network interface definition needs to be changed, stop Matrix Server on the affected server, shut down the network, and then make the change. When the change is complete, bring up the network and restart Matrix Server.
Chapter 4: Configure Network Interfaces From the command line, use the following command to allow administrative traffic on specific network interfaces: mx netif admin ... Use the following command to discourage administrative traffic: mx netif noadmin ... Enable or Disable a Network Interface for Virtual Hosting By default, all network interfaces are enabled for virtual hosting.
5 Configure the SAN SAN configuration includes the following: • Import or deport SAN disks. After a disk is imported, it can be used for PSFS filesystems. • Change the partitioning on SAN disks. • Display information about SAN disks. • Manage multipath I/O. Overview SAN Configuration Requirements Be sure that your SAN configuration meets the requirements specified in the PolyServe Matrix Server Installation Guide.
Chapter 5: Configure the SAN 57 ensure filesystem integrity and then to continue running safely on the remaining set of servers. As part of managing shared SAN devices, the SCL is also responsible for providing each disk with a globally unique device name that all servers in the matrix can use to access the device. Device Names The SCL uses unique device names to control access to shared SAN devices. These names form the pathnames that servers use to access shared data.
Chapter 5: Configure the SAN 58 be present for a matrix to form. To ensure that the database is always available, it is recommended that you create three membership partitions. You can use the mxmpconf utility to fix any problems with the membership partitions. Device Access Once imported, a shared device can be accessed only with its global device name, such as psd6p4. On each server, the SCL creates device node entries in the directory /dev/psd for every partition on the disk.
Chapter 5: Configure the SAN 59 When you import a disk, the SCL gives it a global device name such as psd25. It also assigns global device names to all of the partitions on the disk. The individual partitions are identified by the disk name followed by p and the partition number, such as psd25p4. To import disks using the Management Console, select Storage > Disk > Import or click the Import icon on the toolbar.
Chapter 5: Configure the SAN 60 Deport SAN Disks Deporting a disk removes it from matrix control. The /dev/psd device nodes are removed and the original /dev entries are re-enabled. You cannot deport a disk that contains a mounted filesystem or a membership partition. Also, disks configured in a dynamic volume cannot be deported. (You will need to destroy the dynamic volume and then deport the disk.
Chapter 5: Configure the SAN 61 Change the Partitioning on a Disk The Linux fdisk utility can be used to change the partition layout on a SAN disk. If the disk is currently imported into the matrix, you must first deport the disk. When you use fdisk, the changes made to the partition table are visible only to the server where you made the changes. When you reimport the disk, the other servers in the matrix will see the updated partition table.
Chapter 5: Configure the SAN 62 When you select a disk, the window displays information about the partitions on the disk. Select a partition to display the corresponding Linux mount path for the PSFS filesystem. To import or deport a disk, select that disk and then click Import or Deport as appropriate. Display Disk Information with sandiskinfo The sandiskinfo command can display information for both imported and unimported SAN disks and also for dynamic volumes.
Chapter 5: Configure the SAN 63 -U Display output in the format used by the Management Console. This option is used internally by Matrix Server and does not produce human-readable output. -q Suppress output of all log messages. Following are some examples of these options. Show Partition Information The -a option lists the partitions on each disk. When combined with -u, it displays partition information for unimported disks.
Chapter 5: Configure the SAN 64 Show Available Volumes The -v option lists available volumes on imported disks. These volumes are not currently in use for a PSFS filesystem or a membership partition. # sandiskinfo -v Volume: /dev/psd/psd5p1 Size: 3905M Disk=20:00:00:04:cf:13:32:d1::0 partition=01 type=Linux (83) Volume: /dev/psd/psd5p2 Size: 7386M Disk=20:00:00:04:cf:13:32:d1::0 partition=01 type=Linux (83) Options for Dynamic Volumes The following sandiskinfo options apply only to dynamic volumes.
Chapter 5: Configure the SAN 65 # sandiskinfo --dynvol_properties Dynamic Volume: psv1 Size: 2439M Stripe=Unstriped Subdevice: 20:00:00:04:cf:13:38:18::0/5 Size: 490M psd1p5 Subdevice: 20:00:00:04:cf:13:38:18::0/2 Size: 1950M psd1p2 Dynamic Volume: psv2 Size: 490M Stripe=32K/optimal Subdevice: 20:00:00:04:cf:13:38:18::0/7 Size: 490M psd1p7 Dynamic Volume: psv3 Size: 490M Stripe=8K/optimal Subdevice: 20:00:00:04:cf:13:38:18::0/10 Size: 490M psd1p10 Matrix Server MPIO Matrix Server uses multipath I/O (MPIO
Chapter 5: Configure the SAN 66 Enable or Disable Failover for a PSD Device When a failure occurs in the I/O path to a particular PSD device, Matrix Server by default fails over to another I/O path. You can use this command to control whether this behavior can occur for specific PSD devices. (Matrix Server starts with failover enabled.) # mxmpio enable|disable [
Chapter 5: Configure the SAN 67 An Example of Changing the I/O Path In this example, we will change the target for a device. The mxmpio status -l command identifies the path currently being used by each device. That path is labeled “active.” The following output shows that device psd2p1 is active on target 1. # /opt/polyserve/sbin/mxmpio status -l MPIO Failover is globally enabled Failover Timeout Targets psd1 enabled 30000 0. (41:50) scsi2/0/2/19 1. (08:90) scsi1/0/2/19 psd1p1 enabled 10000 0.
Chapter 5: Configure the SAN 68 Display Status Information The status command displays MPIO status information, including the timeout value, whether MPIO is enabled (globally and per-device), and any targets specified with the active command. Use the -l option to display more information about the targets, as in the above example.
Chapter 5: Configure the SAN 69 Set the Timeout Value The default timeout period for PSD devices is 30 seconds. If you need to modify this value for a particular PSD device, use the following command. value is in milliseconds; however, the smallest unit is 10 milliseconds. A value of zero disables timeouts. # mxmpio timeout value [PSD-device] Other MPIO Support Enable the MPIO Failover Feature on QLogic Drivers QLogic device drivers have an MPIO failover feature.
Chapter 5: Configure the SAN 70 • Options, enclosed in double quotes, to pass to insmod when it loads the driver. If no options are required, type a pair of double quotes (““) in the field. • A text description of the driver. Edit the fc_pcitable File To enable the failover feature, you will need to edit the fc_pcitable file. In the file, locate the line for your device driver. (For version 8.00.00 and later drivers, the option is in the qla2xxx module.
Chapter 5: Configure the SAN 71 scanning process in the hope that the third-party MPIO software will discover and manage the devices. If the process succeeds, Matrix Server will continue to start as normal. If the third-party software is unable to discover the target devices during this process, Matrix Server causes the node to reboot a second time. The goal of the second reboot is to allow the third-party MPIO software a better change to discover devices before any other dependent software starts.
6 Configure Dynamic Volumes Matrix Server includes a Volume Manager that you can use to create, extend, recreate, or destroy dynamic volumes. Dynamic volumes allow large filesystems to span multiple disks, LUNs, or storage arrays. Overview Basic and Dynamic Volumes Volumes are used to store PSFS filesystems. There are two types of volumes: dynamic and basic. Dynamic volumes are created by the Volume Manager.
Chapter 6: Configure Dynamic Volumes 73 Types of Dynamic Volumes Matrix Server supports two types of volumes: striped and concatenated. The volume type determines how data is written to the volume. • Striping. When a dynamic volume is created with striping enabled, a specific amount of data (called the stripe size) is written to each subdevice in turn. For example, a dynamic volume could include three subdevices and a stripe size of 64 KB.
Chapter 6: Configure Dynamic Volumes 74 Guidelines for Creating Dynamic Volumes When creating striped dynamic volumes, follow these guidelines: • The subdevices used for a striped dynamic volume should be the same size. The Volume Manager uses the same amount of space on each subdevice in the stripeset. When a striped dynamic volume is created, the Volume Manager determines the size of the smallest specified subdevice and then uses only that amount of space on each subdevice.
Chapter 6: Configure Dynamic Volumes 75 Filesystem: If you want Matrix Server to create a filesystem that will be placed on the dynamic volume, enter a label to identify the filesystem. If you do not want a filesystem to be created, remove the checkmark from “Create filesystem after volume creation.” If you are creating a filesystem, you can also select the options to apply to the filesystem. Click the Options button to see the Filesystem Option dialog.
Chapter 6: Configure Dynamic Volumes 76 If you enable quotas, you can set the default quota for users and groups on that filesystem. If you do not want a default limit, click “Unlimited.” To assign a limit, click “Limit” and then specify the appropriate size in either kilobyes, megabytes, gigabytes, or terabytes. The defaults are rounded down to the nearest filesystem block. The default quota applies to all users and groups; however, you can change the quota for a specific user or group.
Chapter 6: Configure Dynamic Volumes 77 When the dynamic volume has been created, the Management Console reports the psv name assigned to the volume. On the Management Console, filesystems are identified with the psv name of the dynamic volume on which they are located, as shown below. To create a dynamic volume from the command line, use the following command: mx dynvolume create [--stripesize <4KB-64MB>]
Chapter 6: Configure Dynamic Volumes 78 [ The Stripe State reported in the “Dynamic Volume Properties” section will be one of the following: • Unstriped. The volume is concatenated and striping is not in effect. • Optimal. The volume has only one stripeset that includes all subdevices. Each subdevice is written to in turn. • Suboptimal. The volume has been extended and includes more than one stripeset.
Chapter 6: Configure Dynamic Volumes 79 Extend a Dynamic Volume The Extend Volume option allows you to add subdevices to an existing dynamic volume. When you extend the volume on which a filesystem is mounted, you can optionally increase the size of the filesystem to fill the size of the volume. NOTE: The subdevices used for a striped dynamic volume are called a stripeset. When a striped dynamic volume is extended, the new subdevices form another stripeset.
Chapter 6: Configure Dynamic Volumes 80 Dynamic Volume Properties: The current properties of this dynamic volume. Filesystem Properties: The properties for the filesystem located on this dynamic volume. Available Subdevices: Select the additional subdevices to be added to the dynamic volume. Use the arrow keys to reorder those subdevices if necessary. Extend Filesystem: To increase the size of the filesystem to match the size of the extended volume, click this checkbox.
Chapter 6: Configure Dynamic Volumes 81 Destroy a Dynamic Volume When a dynamic volume is destroyed, the filesystem on that volume, and any persistent mounts for the filesystem, are also destroyed. Before destroying a dynamic volume, be sure that the filesystem is no longer needed or has been copied or backed up to another location. The filesystem must be unmounted when you perform this operation. To destroy a dynamic volume from the Management Console, select Storage > Dynamic Volume > Destroy Volume.
Chapter 6: Configure Dynamic Volumes 82 another location. The filesystem must be unmounted when you recreate the volume. To recreate a dynamic volume on the Management Console, select Storage > Dynamic Volume > Recreate Volume and then choose the volume that you want to recreate. If a filesystem is mounted on the volume, the Recreate Dynamic Volume window shows information for both the dynamic volume and the filesystem.
Chapter 6: Configure Dynamic Volumes 83 To recreate a volume from the command line, you will first need to use the dynvolume destroy command and then run the dynvolume create command. Convert a Basic Volume to a Dynamic Volume If you have PSFS filesystems that were created directly on an imported disk partition or LUN (a basic volume), you can convert the basic volume to a dynamic volume.
Chapter 6: Configure Dynamic Volumes 84 To convert a basic volume to a dynamic volume from the command line, use the following command: mx dynvolume convert Copyright © 1999-2006 PolyServe, Inc. All rights reserved.
7 Configure PSFS Filesystems PolyServe Matrix Server provides the PSFS filesystem. This direct-access shared filesystem enables multiple servers to concurrently read and write data stored on shared SAN storage devices. A journaling filesystem, PSFS provides live crash recovery.
Chapter 7: Configure PSFS Filesystems 86 The PSFS filesystem does not migrate processes from one server to another. If you want processes to be spread across servers, you will need to take the appropriate actions. For example, if you want to spread a kernel build across four servers, you will need to run a cooperative make.
Chapter 7: Configure PSFS Filesystems 87 Filesystem Management and Integrity Matrix Server uses the SANPulse daemon to manage PSFS filesystems. SANPulse performs the following tasks: • Coordinates filesystem mounts, unmounts, and crash recovery operations. • Checks for matrix partitioning, which can occur when matrix network communications are lost but the affected servers can still access the SAN.
Chapter 7: Configure PSFS Filesystems 88 Crash Recovery When a server using a PSFS filesystem either crashes or stops communicating with the matrix, another server in the matrix will replay the filesystem journal to complete any transactions that were in progress at the time of the crash. Users on the remaining servers will notice a slight delay while the journal is replayed. Typically the recovery procedure takes only a few seconds. The recovery process restores only the structural metadata information.
Chapter 7: Configure PSFS Filesystems 89 Filesystem Restrictions The following restrictions apply to the PSFS filesystem: • A PSFS filesystem cannot be used as the root or /boot filesystem. • A server can mount another non-shared filesystem on a directory of a PSFS filesystem; however, that filesystem will be local to the host. It will not be mounted on other hosts in the matrix. • PSFS filesystems cannot be mounted using the Linux loop device.
Chapter 7: Configure PSFS Filesystems 90 Create a Filesystem A PSFS filesystem can be created on a basic volume (a psd device) or a dynamic volume (a psv device). The maximum filesystem size is 16 TB, which requires a dynamic volume. PSFS filesystems use 4 KB as the block size. You can create a filesystem from one server in the matrix using either the Management Console or the command line. The filesystem can then be mounted on all servers in the matrix that can access it via the SAN.
Chapter 7: Configure PSFS Filesystems 91 Label: Type a label that identifies the filesystem. Available Volumes: This part of the window lists the basic or dynamic volumes that are currently unused. Select one of these volumes for the filesystem. Click the Options button to see the Filesystem Options window, which lists the available options for the filesystem. The General tab specifies how the filesystem block size is used. Currently, the block size must be 4 KB. Copyright © 1999-2006 PolyServe, Inc.
Chapter 7: Configure PSFS Filesystems 92 The Quotas tab allows you to specify whether disk quotas should be enabled on this filesystem. If you enable quotas, you can set the default quota for users and groups on that filesystem. If you do not want a default limit, click “Unlimited.” Copyright © 1999-2006 PolyServe, Inc. All rights reserved.
Chapter 7: Configure PSFS Filesystems 93 To assign a limit, click “Limit” and then specify the appropriate size in either kilobyes, megabytes, gigabytes, or terabytes. The defaults are rounded down to the nearest filesystem block. The default user and group quotas apply to all users and groups, respectively; however, you can change the quota for a specific user or group. See Chapter 8, “Manage Filesystem Quotas” on page 120 for more information.
Chapter 7: Configure PSFS Filesystems 94 The mkpsfs Command The mkpsfs command creates a PSFS filesystem on the specified device, which must be imported into the matrix. PSFS filesystems use a block size of 4 KB.
Chapter 7: Configure PSFS Filesystems 95 The -o option has the following parameters: • disable-fzbm Create the filesystem without Full Zone Bit Maps (FZBM). See the PolyServe Knowledge Base article “Using the FZBM On-Disk Filesystem Format” for more information about this feature. • enable-quotas Enables quotas on the filesystem. • userdefault= and groupdefault= Set the default quota limit for users or groups, respectively, to size bytes.
Chapter 7: Configure PSFS Filesystems 96 If you want some servers to have read-write access while other servers have read-only access, mount the filesystem read-write across the matrix. Then, on the servers that should have read-only access, change the permissions on the mountpoint to r-x. NOTE: The /etc/fstab file cannot be used to mount PSFS filesystems automatically when the server is rebooted.
Chapter 7: Configure PSFS Filesystems 97 On Servers: Select the servers where the filesystem is to be mounted. Shared Mount Options This option applies to all servers on which the filesystem is mounted. Read/Write or Read Only. Mount the filesystem read-write or read-only. Read/Write is the default. Server Mount Options These mount options can be different on each server. Mount point: Type the directory mount point for the filesystem. Activate: To mount the filesystem now, click Activate.
Chapter 7: Configure PSFS Filesystems 98 I/O buffering, allowing disk transfers to occur directly in application buffers. In database server terms, this means the I/O will only be buffered in the address space of the database server processes. For example, in the case of Oracle9i, I/O will be buffered in the SGA or PGA. This eliminates the “double-buffering” overhead associated with traditional filesystems.
Chapter 7: Configure PSFS Filesystems 99 quorum disk, the Oracle GCS and SPFILE, and so on) and for tools (for example, backup tools) that manipulate database objects. Database operational functions such as compressing a tablespace that is being transported, copying datafiles, and so forth are also supported. With the DB Optimized mount option, there is no operating system readahead.
Chapter 7: Configure PSFS Filesystems 100 The advanced mount options are as follows: Shared or Exclusive. Either allow all servers having physical access to the filesystem to mount it or allow only one server. Shared is the default. Ordered or Unordered. The Ordered option provides additional security for writes to the filesystem. If a metadata operation will allocate user blocks, the user blocks are written to the filesystem before the metadata is written.
Chapter 7: Configure PSFS Filesystems 101 The --persist argument causes the filesystem to be mounted automatically whenever the server is rebooted. The --activate argument mounts the filesystem now. The --path argument specifies the directory mountpoint. See the Matrix Server Command Reference for a description of the options. The Linux mount Command Use the following syntax to mount a filesystem. The directory mountpoint must already exist.
Chapter 7: Configure PSFS Filesystems 102 • Use of the remount option to change the way a filesystem is mounted. For example, you cannot use mount -o remount,rw to remount a filesystem as read-write if it was originally mounted read-only. The mount operation ignores any options that are not supported by the PSFS filesystem. See mount_psfs(8) for more information. Unmount a Filesystem You can unmount a PSFS filesystem from either the Management Console or the command line.
Chapter 7: Configure PSFS Filesystems 103 mx fs unmount [--persistent] [--active] ALL_SERVERS| ... The Linux umount command. Be sure to specify the mountpoint, such as /mnt/data1, not the partition. umount Persistent Mounts When you mount a filesystem on a server, you can specify that it should be remounted automatically whenever the server is rebooted. When you configure a filesystem mount in this manner, it is a “persistent” mount.
Chapter 7: Configure PSFS Filesystems 104 • To remove the “persistent” mount status for one or more filesystems, select those filesystems and then click Delete. • To mount a filesystem with the options specified for the persistent mount, select that filesystem and then click Activate. Persistent Mounts for a Filesystem To see all persistent mounts for a particular filesystem, select that filesystem on the Filesystems window, right-click, and select Edit Persistent Mounts.
Chapter 7: Configure PSFS Filesystems 105 View or Change Filesystem Properties To see information about a specific filesystem, select that filesystem, right-click, and select Properties. Label: This field specifies the label that is assigned to the filesystem. If the filesystem does not have a label, the field will be blank. You can change the label if necessary. Volume Tab On the Properties window, the Volume tab provides information about the storage device and allows you to extend the filesystem.
Chapter 7: Configure PSFS Filesystems 106 of the volume. When you click on the Extend Filesystem button, you will see a warning such as the following. When you click Yes, Matrix Server will extend the filesystem to use all of the available space. If the filesystem is on a dynamic volume, you can use the Extend Volume option to increase the size of both the dynamic volume and the filesystem. Features Tab The Features tab shows whether Full Zone Bit Maps (FZBM) or quotas are enabled on the filesystem.
Chapter 7: Configure PSFS Filesystems 107 Quotas Tab The Quotas tab allows you to enable or disable quotas on the filesystem, to set the default hard limit for users and groups, and to view or modify the quotas for individual users and groups. If you enable quotas, you can set the default quota for users and groups on that filesystem. If you do not want a default limit, click “Unlimited.
Chapter 7: Configure PSFS Filesystems 108 View Filesystem Status The Filesystems tab on the Management Console lists information about all filesystems in the matrix. If a filesystem is mounted, the entry for that filesystem lists the servers where it is mounted and the mountpoints. The following example shows that filesystem 1p3 is mounted at /psd1p3 on four servers.
Chapter 7: Configure PSFS Filesystems 109 View Filesystem Errors for a Server The Management Console can display information about the last error or event that occurred on a particular filesystem mount. To see the information, select the filesystem and then select the server. Right-click, and select View Last Error. Check a Filesystem for Errors If a filesystem is not unmounted cleanly, the journal will be replayed the next time the filesystem is mounted to restore consistency.
Chapter 7: Configure PSFS Filesystems 110 For more information about the check, click the Details button. If psfsck locates errors that need to be repaired, it will display a message telling you to run the utility from the command line. For more information, see the PolyServe Matrix Server Command Reference or the psfsck(8) man page. CAUTION: We strongly recommend that you make a backup copy of the entire partition before you attempt to run psfsck with the --rebuild-tree option.
Chapter 7: Configure PSFS Filesystems 111 Because atime updates can have a performance impact, and to maintain the same behavior as in previous versions of the PSFS filesystem, the atime update feature is disabled by default. If needed, the feature can be enabled on specific nodes. A filesystem mounted on those nodes will perform atime updates unless the feature is disabled specifically for that filesystem at mount time. NOTE: Access times are never updated on read-only filesystems.
Chapter 7: Configure PSFS Filesystems 112 Suspend a Filesystem for Backups The psfssuspend utility suspends a PSFS filesystem in a stable, coherent, and unchanging state. While the filesystem is in this state, you can copy it for backup and/or archival purposes. When copying directly from a suspended device, be sure to use the raw device (/dev/rpsd/...) to ensure that all blocks copied are up-to-date.
Chapter 7: Configure PSFS Filesystems 113 NOTE: If an attempt to mount the copied filesystem fails with an “FSID conflict” error, run the following command as user root. In the command, is the psd or psv device, such as /dev/psd/psd1p7 or /dev/psv/psv1, containing the copied filesystem, and
Chapter 7: Configure PSFS Filesystems 114 • Specify the size in kilobytes, megabytes, or gigabytes: -s size[K|M|G][T] • Specify the amount (in kilobytes, megabytes, gigabytes, or terabytes) by which the filesystem should be increased: -s [+|-]size[K|M|G][T] The following example increases the size of the filesystem by 1 GB. resizepsfs -s +1G /dev/psd/psd6p4 NOTE: If you do not specify any options, resizepsfs will try to resize the filesystem to the full size of the partition.
Chapter 7: Configure PSFS Filesystems 115 You can then remount the filesystem: • If a persistent mount is associated with the filesystem, you can activate the mount from the Management Console. Right-click the filesystem that you have unmounted and select “Edit Persistent Mounts.” Then, on the Edit Persistent Mounts window, select the nodes where the filesystem should be mounted and click Activate.
Chapter 7: Configure PSFS Filesystems 116 NOTE: CDSLs will not work if they are accessed through NFS because NFS resolves the link on the client. Examples Locate a Target by Its Hostname This example uses three servers: serv1, serv2, and serv3. Each server must have specific configuration files in the /oracle/etc directory. You can use a CDSL to simplify accessing these server-specific files. 1. Create a subdirectory for each server in /oracle, which is a PSFS filesystem.
Chapter 7: Configure PSFS Filesystems 117 it returns alpha. We need separate /oracle/bin and /oracle/sbin directories for each machine type. You can use CDSLs to simplify accessing these machine-specific directories. 1. Create a subdirectory in /oracle for each machine type and then create a bin and sbin directory in the new machine-type directories. You now have the following directories in the /oracle PSFS filesystem: /oracle/i386/bin /oracle/i386/sbin /oracle/alpha/bin /oracle/alpha/sbin 2.
Chapter 7: Configure PSFS Filesystems 118 2. Populate the new directories with the appropriate files. 3. Create the CDSL: ln -s /etc/{HOSTNAME} /oracle/etc When you are logged in on serv1, the /oracle/etc symbolic link will point to /etc/serv1.xvz.com. On serv2, it will point to /etc/serv2.xvz.com. Matrix-Wide File Locking Matrix Server supports matrix-wide locks on files located on PSFS filesystems.
Chapter 7: Configure PSFS Filesystems 119 An error is returned if does not exist or has not been initialized by psfssema -i, or if does not exist. Unlock a Semaphore To unlock a PSFS command-line semaphore, use this command: $ psfssema -r The command unlocks the PSFS command-line semaphore associated with , which is a semaphore file created by psfssema -i.
8 Manage Filesystem Quotas The PSFS filesystem supports disk quotas, including both hard and soft limits. After quotas are enabled on a filesystem, you can use the Quotas editor provided with the Management Console to view or set quotas for specific users and groups. Hard and Soft Filesystem Limits The PSFS filesystem supports both hard and soft filesystem quotas for users and groups.
Chapter 8: Manage Filesystem Quotas 121 When you create a PSFS filesystem, you will need to specify whether quotas should be enabled or disabled on that filesystem. (See “Create a Filesystem” on page 90.) Quotas can also be enabled or disabled on an existing filesystem. The filesystem must be unmounted. Locate the filesystem on the Management Console, right-click, and select Properties. Then go to the Quotas tab on the Properties dialog. Check or uncheck “Enable quotas” as appropriate.
Chapter 8: Manage Filesystem Quotas 122 default limit, click “Unlimited.”) The default quotas apply to all users and groups; however, you can change the quota for a specific user or group. To enable or disable quotas from the command line, use mx quota command or the psfsquota or psfsck utilities. The psfsquota and psfsck utilities provide the following options to enable or disable quotas on a PSFS filesystem: • --enable-quotas Build the necessary quota infrastructure on the specified filesystem.
Chapter 8: Manage Filesystem Quotas 123 The following example enables quotas on volume psv1 and sets the default user and group quotas to 20 gigabytes. /opt/polyserve/sbin/psfsquota --enable-quotas --set-udq 20G --set-gdq 20G psv1 For more information about these utilities and the mx quota command, see the PolyServe Matrix Server Command Reference.
Chapter 8: Manage Filesystem Quotas 124 The default display includes columns for the name and common name of the user or group, the hard limit, the disk space currently used, and the percent of the hard limit that has been reached. NOTE: When an asterisk (*) appears in the “Hard Limit” column of the quota report, it means that the hard limit is set to the default for the filesystem. You can add columns to the display for the user or group ID and the soft limit.
Chapter 8: Manage Filesystem Quotas 125 Quota Searches You can use the search feature on the left side of the quota editor to locate quota information for specific users or groups. If you are searching by name, the quota information must be in a database (such as a password file or LDAP database) that can be accessed from the server where the filesystem is mounted. The search locates the name in the database and matches it with the ID, which is the value stored on the filesystem.
Chapter 8: Manage Filesystem Quotas 126 The basic search procedure is as follows: • Enter a search pattern if desired. If the pattern is a regular expression, click on “Regular Expression.
Chapter 8: Manage Filesystem Quotas 127 View or Change Limits for a User or Group To see the limits assigned to a particular user or group, highlight that user (or group) on the Quotas dialog and then click the Properties icon on the toolbar. You can set the hard and soft quota limits as necessary. Add Quotas for a User or Group To assign quotas to a user or group, click the Add button on the Quota editor toolbar and then search for the user or group in the database.
Chapter 8: Manage Filesystem Quotas 128 If you know the user or group ID and want to skip the search (or if the LDAP or password file is missing), click Advanced and enter the ID on the Advanced User/Group Add dialog. You can specify ranges of IDs or a list of IDs separated by commas. Then select the type of search (User or Group) and click Add. Copyright © 1999-2006 PolyServe, Inc. All rights reserved.
Chapter 8: Manage Filesystem Quotas 129 NOTE: If the user or group has been assigned quotas on another filesystem, you can highlight the entry for that user or group on that filesystem and then select Edit > Insert to open the Add Quota dialog. When the Add Quota dialog appears, select the appropriate filesystem and set the quota limits. Any existing quota limits on the filesystem will be overwritten. When you click OK, the quota limits will be assigned to the user or group.
Chapter 8: Manage Filesystem Quotas 130 Remove Quotas for a User or Group If a particular user or group no longer owns files on a filesystem, you can remove the quotas for that user or group. Select the user (or group) on the Quotas dialog and then click the Delete icon on the toolbar. NOTE: The quotas cannot be removed if the user or group has blocks allocated on the filesystem.
Chapter 8: Manage Filesystem Quotas 131 NOTE: You do not need to install this RPM to use quotas with PSFS filesystems. With the exception of warnquota, all of the functionality of these commands is provided by the Quotas window on the Management Console and by the mx quota commands. Be sure to invoke the Matrix Server versions of these commands instead of the versions provided with the Linux distribution.
Chapter 8: Manage Filesystem Quotas 132 The -f option specifies the file that psfsrq should read to obtain the quota data. If this option is not specified, psfsdq reads from stdin. filesystem is the psd or psv device used for the filesystem. Examples The following command saves the quota information for the filesystem located on device psd1p5. # psfsdq -f psd1p5.quotadata psd1p5 The next command restores the data to the filesystem: # psfsrq -f psd1p5.
9 Manage Hardware Snapshots Matrix Server provides support for taking hardware snapshots of PSFS filesystems. The subdevices on which the filesystems are located must reside on one or more storage arrays that are supported for snapshots. Snapshot support can be configured either on the Management Console “Configure Matrix” window or via the mxconfig utility. (See the PolyServe Matrix Server Installation Guide for more information.) This procedure creates a snapshot configuration file on each server.
Chapter 9: Manage Hardware Snapshots 134 when source filesystem data is changed. Snapshots can be fully-allocated (the maximum amount of storage space is reserved at creation time), or demand-allocated (the minimum amount of storage space is reserved at creation time, allocating more as necessary). Snapshots are intended to be temporary. A snapclone is similar to a snapshot, except that it completely copies the source filesystem data at a particular point in time.
Chapter 9: Manage Hardware Snapshots 135 snapshot or snapclone is assigned a Matrix Server psd or psv device name. In the following example, the first filesystem entry is a snapclone. The second entry is a regular filesystem, and is followed by a snapshot.
Chapter 9: Manage Hardware Snapshots 136 To delete a snapshot, select the snapshot on the Management Console, right-click, and select Delete. To delete a snapshot from the command line, type the following: mx snapshot destroy Snapclone devices, like regular filesystem LUNs, cannot be deleted from the Management Console. To delete snapclones, you must destroy the filesystem and/or volume, deport the LUN(s), and delete the LUN(s) with the array-specific utilities.
10 Matrix Operations on the Applications Tab The Applications tab on the Management Console shows all Matrix Server applications, virtual hosts, service monitors, and device monitors configured in the matrix and enables you to manage and monitor them from a single screen. Applications Overview An application provides a way to group associated matrix resources (virtual hosts, service monitors, and device monitors) so that they can be treated as a unit.
Chapter 10: Matrix Operations on the Applications Tab 138 application name. Similarly, if you do not specify an application name for a device monitor, the application will use the same name as the monitor. The Applications Tab The Management Console lists applications and their associated resources (virtual hosts, service and device monitors) on the Applications tab. The applications and resources appear in the rows of the table. (Double-click on a resource to see its properties.
Chapter 10: Matrix Operations on the Applications Tab 139 The icons used on the Applications tab report the status of the servers, applications, and resources. The following icons are used in the server columns to indicate the status of applications and resources. Servers also use these icons when they are in a state other than Up/Okay. Up/Okay Down Starting Unknown Stopping Virtual hosts and single-active monitors use the following icons to indicate the primary and backups.
Chapter 10: Matrix Operations on the Applications Tab 140 In the following example, server 99.11.6.4 is down. The status for the first application, 99.11.6.200, is Ok because clients are accessing the application through the primary server. The down server is a backup and does not affect client access. Application dg, however, is reporting an error. The Virtual NFS Service 99.11.6.203 is transitioning up, as indicated by the yellow arrow.
Chapter 10: Matrix Operations on the Applications Tab 141 Filter the Applications Display You can use filters to limit the information appearing on the Application tab. For example, you may want to see only a certain type of monitor, or only monitors that are down or disabled. You can use filters to do this. To add a filter, click the “New Filter” tab and then configure the filter. Name: Specify a name for this filter.
Chapter 10: Matrix Operations on the Applications Tab 142 Click OK to close the filter. The filter then appears as a separate tab and will be available to you when you connect to any cluster. To modify an existing filter, select that filter, right-click, and select Edit Filter. To remove a filter, select the filter, right-click, and select Delete Filter.
Chapter 10: Matrix Operations on the Applications Tab 143 When you reach a cell that accepts drops, the cursor will change to an arrow. The following drag and drop operations are allowed. Applications These operations are allowed only for applications that include at most only one virtual host. • Assign an application to a server. Drag the application from the Name column to the empty cell for the server.
Chapter 10: Matrix Operations on the Applications Tab 144 • Switch the primary and backup servers (or two backup servers) for a virtual host. Drag the virtual host from one server cell to the cell for the other server. If the virtual host is active, this operation can disconnect existing applications that depend on the virtual host. When the operation is complete, the ordering for failover will be switched. • Remove a virtual host from a server.
Chapter 10: Matrix Operations on the Applications Tab 145 reordered as necessary. If the monitor was multi-active, it will remain active on any other servers on which it is configured. (The device monitor cannot be removed via drag and drop if it is configured on only one server.) Menu Operations Applications The following operations affect all entities associated with a Matrix Server application.
Chapter 10: Matrix Operations on the Applications Tab 146 Virtual Hosts When you right-click on a virtual host, you can perform the same operations as are available on the Servers or Virtual Hosts tab. • Re-host, or move, the virtual host to another node in the matrix. • Add a service monitor. • Enable or disable the virtual host. • View or change the properties for the virtual host. • Delete the virtual host.
11 Configure Virtual Hosts Matrix Server uses virtual hosts to provide failover protection for servers and network applications. Overview A virtual host is a hostname/IP address configured on a set of network interfaces. Each interface must be located on a different server. The first network interface configured is the primary interface for the virtual host. The server providing this interface is the primary server.
Chapter 11: Configure Virtual Hosts 148 Matrix Health and Virtual Host Failover To ensure the availability of a virtual host, Matrix Server monitors the health of the administrative network, the active network interface, and the underlying server. If you have created service or device monitors, those monitors periodically check the health of the specified services or devices.
Chapter 11: Configure Virtual Hosts 149 The failover operation to another network interface has minimal impact on clients. For example, if clients were downloading Web pages during the failover, they would receive a “transfer interrupted” message and could simply reload the Web page. If they were reading Web pages, they would not notice any interruption. If the active network interface fails, only the virtual hosts associated with that interface are failed over.
Chapter 11: Configure Virtual Hosts 150 Add or Modify a Virtual Host To add or update a virtual host, select the appropriate option: • To add a new virtual host, select Matrix > Virtual Host > Add Virtual Host or click the V-Host icon on the toolbar. Then configure the virtual host on the Add Virtual Host window. • To update an existing virtual host, select that virtual host on either the Server or Virtual Hosts window, right-click, and select Properties.
Chapter 11: Configure Virtual Hosts 151 Always active: If you check this box, upon server failure, the virtual host will move to an active server even if all associated service and device monitors are inactive or down. If the box is not checked, failover will not occur when all associated service and device monitors are inactive or down and the virtual host will not be made active anywhere. (See “Virtual Host Activeness Policy” on page 155 for details.
Chapter 11: Configure Virtual Hosts 152 Available:/Members: The Available column lists all network interfaces that are available for this virtual host. If you are configuring the virtual host on all servers in the matrix, move the network interface for the primary server to the Members column. The corresponding network interfaces on the remaining servers will then be moved to the Members column automatically.
Chapter 11: Configure Virtual Hosts 153 Other Virtual Host Procedures Delete a Virtual Host Select the virtual host to be deleted on either the Servers window or the Virtual Hosts window, right-click, and select Delete. Any service monitors configured on that virtual host are also deleted. To delete a virtual host from the command line, use this command: mx vhost delete ...
Chapter 11: Configure Virtual Hosts 154 Rehost a Virtual Host You can use the Applications tab to modify the configuration of a virtual host. For example, you might want to change the primary for the virtual host. To rehost a virtual host, right click in a cell for that virtual host and then select Rehost. You will then see a message warning that this action will cause the clients of the application to lose connection. When you continue, the Virtual Host Re-Host window appears.
Chapter 11: Configure Virtual Hosts 155 “healthy,” all of the services associated with the virtual host are up and enabled. When certain events occur on the server where a virtual host is located, the ClusterPulse process will attempt to fail over the virtual host to another server configured for that virtual host. For example, if the server goes down, ClusterPulse will check the health of the other servers and then determine the best location for the virtual host.
Chapter 11: Configure Virtual Hosts 156 • The PanPulse process controls whether a network interface is marked up or down. When PanPulse determines that an interface currently hosting a virtual host is down, ClusterPulse will begin searching for another server on which to locate the virtual host. 3. ClusterPulse narrows the list to those servers without inactive, down, or disabled Matrix Server device monitors. If there are no servers that meet this criteria, the virtual host is not made active anywhere.
Chapter 11: Configure Virtual Hosts 157 Specify Failover/Failback Behavior The Probe Severity setting allows you to specify whether a failure of the service or device monitor probe should cause the virtual host to fail over. For example, you could configure a custom device monitor to watch a router. The device monitor probe might occasionally time out because of heavy network traffic to the router; however the router is still functioning.
Chapter 11: Configure Virtual Hosts 158 • For service monitors, you can assign a priority to each monitor (the Service Priority setting). If ClusterPulse cannot locate an interface where all services are “up” on the underlying server, it selects an interface where the highest priority service is “up” on the underlying server.
Chapter 11: Configure Virtual Hosts 159 • After the virtual host fails over to node 2, a service monitor probe fails on that node. Now both nodes have a down service monitor. Failback does not occur because the servers are equally healthy. If the failed service is then restored on node 1, that node will now be healthier than node 2 and failback will occur. (Note that if the virtual host policy was AUTOFAILBACK, failback would occur when the probe failed on node 2 because both servers were equally healthy.
12 Configure Service Monitors Service monitors are typically used to monitor a network service such as HTTP or FTP. If a service monitor indicates that a network service is not functioning properly on the primary server, Matrix Server can transfer the network traffic to a backup server that also provides that network service. Overview Before creating a service monitor for a particular service, you will need to configure that service on your servers.
Chapter 12: Configure Service Monitors 161 removed from that server. Service monitor parameters (such as probe severity, Start scripts, and Stop scripts) are consistent across all servers configured for a virtual host. Service Monitors and Failover If a monitored service fails, Matrix Server attempts to relocate any virtual hosts associated with the service monitor to a network interface on a healthier server.
Chapter 12: Configure Service Monitors 162 FTP Service Monitor By default the FTP service monitor probes TCP port 21 of the virtual host address. You can change this port number to the port number configured for your FTP server. The default frequency of the probe is every 30 seconds. The default time that the service monitor waits for a probe to complete is five seconds. The probe function attempts to connect to port 21 and expects to read an initial message from the FTP server.
Chapter 12: Configure Service Monitors 163 server. If there are no errors, the service status remains Up. If an error occurs, the status is set to Down. TCP Service Monitor The generic TCP service monitor defaults to TCP port 0. You should set the port to the listening port of your server software. The default frequency of the probe is every 30 seconds. The default time that the service monitor waits for a probe to complete is five seconds.
Chapter 12: Configure Service Monitors 164 Single-probe monitors perform the probe function only on the node where the monitor instance is active. This type of configuration is useful for applications such as databases that are not cluster-aware and should be run on only one node at a time. See“Advanced Topics” on page 209 for information about developing probe scripts for custom monitors and integrating monitors with custom applications.
Chapter 12: Configure Service Monitors 165 Virtual Host: The service monitor is assigned to this virtual host. If the virtual host is associated with a Matrix Server application, the service monitor will also be associated with that application. Port: Matrix Server supplies the default port number for the service you select. If your service uses a port other than the default, type that port number here. Monitor Type: Select the type of service that you want to monitor.
Chapter 12: Configure Service Monitors 166 • Custom service monitor. You will be asked for the location of the user probe script. Enter the pathname for the probe script be used with the monitor. See “Custom Scripts” on page 209 for information about writing probe scripts. When you complete the Add Service Monitor form, the new monitor appears on the Management Console. In this example, the service monitor is active on server owl, which provides the active network interface for the virtual host.
Chapter 12: Configure Service Monitors 167 Timeout and Failure Severity This setting works with the virtual host policy (either AUTOFAILBACK or NOFAILBACK) to determine what happens when a probe of a monitored service fails. The default policies (AUTOFAILBACK for the virtual host and AUTORECOVERY for the monitor) cause ClusterPulse to fail over the associated virtual host to a backup network interface on another server.
Chapter 12: Configure Service Monitors 168 NOAUTORECOVER. The virtual host fails over when a monitor probe fails and the monitor is disabled on the original node, preventing automatic failback. When the monitor is reenabled, failback occurs according to the virtual host’s failback policy. The NOAUTORECOVER option is useful when integrating Matrix Server with a custom application where certain application-specific actions must be taken before the failback can occur.
Chapter 12: Configure Service Monitors 169 the node where the associated virtual host is activated, and the probe takes place on that node. The monitor instances on other nodes are marked as “standby” on the Management Console. If the virtual host fails over to a backup node, the monitor instance on the original node becomes inactive and the probe is no longer run on that node.
Chapter 12: Configure Service Monitors 170 Recovery script. Runs after a monitor probe failure is detected, in an attempt to restore the service. Start script. Runs as a service is becoming active on a server. Stop script. Runs as a service is becoming inactive on a server. When a monitor is instantiated for a service (because the ClusterPulse daemon is starting or the configuration has changed), Matrix Server chooses the best server to make the service active. The Start script is run on this server.
Chapter 12: Configure Service Monitors 171 Such a failure or timeout creates an event associated with the monitor on the server where the failure or timeout occurred. You can view these events on the Management Console and clear them from the Console or command line after you have fixed the problems that caused them. You can configure the failover behavior with the Event Severity attribute. There are two settings: CONSIDER. This is the default value.
Chapter 12: Configure Service Monitors 172 PARALLEL. Matrix Server does not enforce the strict ordering sequence for Stop and Start scripts. The scripts run in parallel across the matrix as a virtual host is in transition. The PARALLEL configuration can speed up failover time for services that do not depend on strict ordering of Start and Stop scripts.
Chapter 12: Configure Service Monitors 173 Remove Service Monitor from a Server To remove a service monitor from a the network interface associated with a specific server, right-click, and select Remove From Server. You will then be asked to verify that you want to remove the server. View Service Monitor Errors To view the last error for a service monitor, select that service monitor, right-click, and select View Last Error.
13 Configure Device Monitors PolyServe Matrix Server provides built-in device monitors that can be used to watch disk devices or to monitor the status of PSFS filesystems. You can also create custom device monitors. Overview Matrix Server provides the following types of device monitors. To configure a device monitor, you will need to specify the probe timeout and frequency and a monitor-specific value.
Chapter 13: Configure Device Monitors 175 determines the placement of the associated virtual hosts. For example, if a probe fails on the primary server for a virtual host, the virtual host may fail over to a backup server. See “Device Monitors and Failover” on page 176 for details about where a device monitor is active. Custom device monitors can be configured to be either multi-active or single-active. With a single-active configuration, the monitor is active only on the primary node.
Chapter 13: Configure Device Monitors 176 Disk Device Monitor The DISK device monitor can be used to monitor a disk device. It periodically attempts to read the first block of the disk partition that you specify to determine whether the disk is operating normally. Custom Device Monitor A CUSTOM device monitor can be particularly useful when integrating Matrix Server with a custom application. Custom monitors can be configured to be either single-active or multi-active.
Chapter 13: Configure Device Monitors 177 2. ClusterPulse considers the list of servers that are both up and enabled and that are configured for the device monitor. Note the following: • A server that has not finished joining the matrix (see “Server Access to the SAN” on page 222) is not considered up for the purpose of activating the device monitor.
Chapter 13: Configure Device Monitors 178 Add or Modify a Device Monitor Select the appropriate option: • To add a new device monitor, select the server to be associated with the monitor from the Servers window, right-click, and select Add Device Monitor (or click the Device icon on the toolbar). Then configure the device monitor on the New Device Monitor window. • To update an existing device monitor, select the monitor on the Servers window, right-click, and select Properties.
Chapter 13: Configure Device Monitors 179 Timeout and Frequency: These fields are set to the default values for the type of device you have selected. Change them as needed. The other monitor parameters are dependent on the type of monitor that you are creating. • DISK monitor. At the “Monitor partition” prompt, specify a partition on the disk. The monitor will periodically attempt to read the first block of this partition to determine whether the disk is operating normally.
Chapter 13: Configure Device Monitors 180 To add or update a device monitor from the command line, use this command: mx device add|update [--type CUSTOM|DISK|SHARED_FILESYSTEM] [--timeout ] [--frequency ] [--parameters ] [] ... See “Advanced Settings for Device Monitors” for information about the other arguments that can be used for device monitors.
Chapter 13: Configure Device Monitors 181 The Probe Severity setting works with the virtual host policy (either AUTOFAILBACK or NOFAILBACK) to determine what happens when a monitored device fails. The default policies (AUTOFAILBACK for the virtual host and AUTORECOVERY for the device monitor) cause ClusterPulse to fail over the associated virtual hosts to a backup network interface on another server when the monitor probe fails.
Chapter 13: Configure Device Monitors 182 Custom Scripts The Scripts tab lets you configure custom Recovery, Start, and Stop scripts for a device monitor. Device monitors can optionally be configured with scripts that are run at various points during matrix operation. The script types are as follows: Recovery script. Runs after a monitor probe failure is detected, in an attempt to restore the device. Start script. Runs as a device is becoming active on a server. Stop script.
Chapter 13: Configure Device Monitors 183 Start scripts must be robust enough to run when the device is already started, without considering this to be an error. Similarly, Stop scripts must be robust enough to run when the device is already stopped, without considering this to be an error. In both of these cases, the script should exit with a zero exit status.
Chapter 13: Configure Device Monitors 184 IGNORE. Events are ignored and Start or Stop script failures will not cause failover. This is useful when the action performed by the Start and Stop scripts is not critical, but is important enough that you want to keep a record of it.
Chapter 13: Configure Device Monitors 185 To configure script ordering from the command line, use this option: --ordering serial|parallel Virtual Hosts The Virtual Hosts tab lets you specify any virtual hosts that you want to fail over if the device fails. (By default, no virtual hosts will fail over.) If you want all virtual hosts to be dependent on the device monitor and to fail over if the device fails, check “Select All Virtual Hosts.
Chapter 13: Configure Device Monitors 186 If you are creating a SHARED_FILESYSTEM monitor, select the virtual hosts associated with applications that access data on the PSFS filesystem being monitored. To specify virtual hosts from the command line, use this option: --vhosts ,,... Servers for Device Monitors The Servers tab allows you to select the servers on which the device monitor will be configured. You can also set some options related to the monitor probe operation and failover.
Chapter 13: Configure Device Monitors 187 Activity Type. Where the monitor can be active. The options are: • Single-Active. The monitor is active on only one of the selected servers. Upon server failure, the monitor will fail over to an active server unless all associated service and device monitors are down. (“Associated” service and device monitors are those monitors that are associated with the same virtual host as this device monitor.) • Single-Always-Active.
Chapter 13: Configure Device Monitors 188 To specify the Probe Type for a custom monitor, use this option: --probe single|active|multiple To specify the Activity Type, use this option: --activity single|multiple Set a Global Event Delay A device monitor that is configured to be multi-active or to probe on multiple servers can experience a global event, in which the shared resource being monitored is reported to be down on all servers.
Chapter 13: Configure Device Monitors 189 Enable Global Event Delay. This feature is enabled by default. Delay: Type the number of seconds that the device monitor should wait before failing over virtual hosts following a global event. The default is 65 seconds. To determine the number of seconds for the delay, check the probe frequency and probe timeout values of the shared resource monitors in your configuration.
Chapter 13: Configure Device Monitors 190 To delete a device monitor from the command line, use this command: mx device delete ... Disable a Device Monitor Select the device monitor to be disabled from the Servers or Application tab, right-click, and select Disable. To disable a device monitor from the command line, use this command: mx device disable ...
14 Configure Notifiers If you would like certain actions to take place when matrix events occur, you can configure notifiers that define how the events should be handled. Overview Matrix Server uses notifiers to enable you to view event information generated by servers, network interfaces, virtual hosts, service monitors, device monitors, and filesystems. Notifiers send events from these entities to user-defined notifier scripts.
Chapter 14: Configure Notifiers 192 When adding a notifier, you will need to specify a name for the notifier and to supply the script to be run when an event is triggered that matches the event and entity combination. An event causes the script, which has its standard input wired to a pipe from the notifier_agent, to be run. The notifier script will be run with any arguments that you included in the script string. The script may read STDIN to accept the event message.
Chapter 14: Configure Notifiers 193 Name: Enter a name for the notifier. You can use up to 32 alphanumeric characters. Script: Enter the name of the script that will be run when an event occurs. Event: Check the events for which you want to receive notification. Entity: Check the entities for which you want to receive notification. The USER1 - USER7 entities are user-defined entities for the mxlogger command. See “Add Your Own Messages to the Matrix Log File” on page 249.
Chapter 14: Configure Notifiers 194 Enable a Notifier Select the notifier to be enabled from the Notifiers window, right-click, and select Enable. To enable a notifier from the command line, use this command: mx notifier enable ... Test a Notifier Select the notifier to be tested from the Notifiers window, right-click, and select Test. The event messages for each configured entity will now be sent to the notifier.
Chapter 14: Configure Notifiers 195 Sample Notifier Messages Following is an example of a notifier message: 10.10.1.1 State VHOSTS 130 Oct 31 2000 13:13:00 Virtual host change - 10.1.1.1 now active on 10.10.1.1 The Test Notifier option causes a test event to be generated for each of the event/entity combinations that you configure for the notifier. Following is an example: 10.10.1.
15 Test Your Configuration After you have configured Matrix Server, we recommend that you perform a set of basic tests to validate that SAN shared filesystem operation, virtual host operation and failover, DNS load-balancing operation and failover, and failover of the LAN administrative network work correctly. After completing these tests successfully, you may want to run a more substantial test of your specific requirements to validate that Matrix Server is working in your environment.
Chapter 15: Test Your Configuration 197 Test SAN Connectivity and Shared Filesystem Operation Use the following procedure to test basic SAN connectivity and shared filesystem operation in your matrix: 1. From the Management Console, log into one of the matrix servers. 2. Import an unused SAN disk into your matrix configuration. 3. Create a PSFS filesystem on an unused partition on this disk. 4. Mount the PSFS filesystem on each server in the matrix. This filesystem will now be shared by these servers. 5.
Chapter 15: Test Your Configuration 198 6. Power off one of the servers. Verify that the other servers in the matrix are still able to access the shared filesystem. 7. Restore the power to the server and then reboot it. Verify that this server, upon rebooting, is able to mount the shared filesystem. Verify that all servers are able to access the shared filesystem. Test Virtual Host Operation and Failover The following procedure tests automatic failover and recovery reintegration.
Chapter 15: Test Your Configuration 199 3. Verify that all servers are up, that the service you are testing is up, and that the virtual host is active on the primary server and inactive on the backup servers. 4. Stop the service you are testing on the primary server (for example, for HTTP, bring down the HTTP daemon). 5. Verify that Matrix Server detects the service failure. The virtual host should be inactive on the primary server and active on the first backup server. 6.
Chapter 15: Test Your Configuration 200 The DNS name is www.acmd.com and 192.168.100.1 is a virtual host with primary on acmd1 and backup on acmd2. 192.168.100.2 is primary on acmd2 and backup on acmd1. DNS is set up to round robin on the servers acmd1 and acmd2, using the virtual host addresses 192.168.100.1 and 192.168.100.2. Validate Correct Load-Balancing Operation The following procedure validates that DNS round robin and Matrix Server are working correctly. 1. Ping www.acmd.com. 2.
Chapter 15: Test Your Configuration 201 7. Verify that DNS now serves up both IP addresses again and that the ping is returned correctly by both. Test LAN Failover of Administrative Matrix Traffic Use the following procedure to test the LAN administrative traffic failover capability of Matrix Server: 1. Connect your matrix servers with at least two physically separate LANs. Configure the Linux network software to enable the interfaces to these networks on each of the matrix servers. 2.
16 Performance Monitoring Matrix Server includes a Performance Dashboard that you can use to monitor the following: • Average CPU utilization • Average committed physical memory utilization • Average swap memory utilization • Total PSFS filesystem I/O transfer rate • Total PSFS filesystem I/Os per second • Average one-minute run-queue depth View the Performance Dashboard The Performance Dashboard can report performance information for either all servers in the matrix or a specific server.
Chapter 16: Performance Monitoring 203 The display includes six performance counters. Each counter shows the aggregate value of that counter for all of the servers in the matrix. For example, the average CPU utilization counter shows the average for the CPUs on all of the servers. The status panel at the bottom of the display includes a timestamp showing when the last set of data was received. A new sample is taken every five seconds.
Chapter 16: Performance Monitoring 204 View Counter Details When you click the Detail button for a particular counter, you will see some additional information about the counter, including the name of the counter and the associated performance object (either CPU, memory, filesystem I/O, or system). The Detail view also shows performance information for each instance of the performance object.
Chapter 16: Performance Monitoring 205 You can sort the display according to a particular column. Click the column heading to perform the sort. An up arrow in the column heading indicates that the column is sorted in ascending order. A down arrow indicates that the column is sorted in descending order. You can customize the display in the following ways: • Select the instances that you want to display.
Chapter 16: Performance Monitoring 206 Display the Dashboard for One Server You can also display the Performance Dashboard for a specific server. Select the server on the Management Console, right-click, and select Matrix Performance Dashboard for Server. The values displayed are just for the selected server. Copyright © 1999-2006 PolyServe, Inc. All rights reserved.
Chapter 16: Performance Monitoring 207 Start the Dashboard from the Command Line To invoke the Dashboard from the command line, use this command: mx matrix dashboard [--server ALL_SERVERS | ] [--datasets | UNLIMITED] [--noHeaders] [--csv] The arguments are: • --servers ALL_SERVERS | The server to be monitored. The default is all servers. • --datasets | UNLIMITED The number of data sets to be returned. The default is one.
Chapter 16: Performance Monitoring 208 kernel attempts to always use memory, but in ways that improve system performance and can easily be reclaimed, such as a disk buffer cache. For this counter, lower numbers are better because they indicate that the kernel is under less pressure to balance memory resources. Average swap memory utilization (%). This counter reports the percentage of allocated swap space that is currently in use by the kernel’s virtual memory subsystem.
17 Advanced Topics The topics described here provide technical details about Matrix Server operations. This information is not required to use Matrix Server in typical configurations; however, it may be useful if you want to design custom scripts and monitors, to integrate Matrix Server with custom applications, or to diagnose complex configuration problems.
Chapter 17: Advanced Topics 210 • The command will be executed as root. • For a service monitor, the file must be installed on each server associated with the virtual host on which the service monitor is located. • For a device monitor, the file must be installed on each server that is configured with the virtual hosts associated with the device monitor. • The command termination exit status is used to signal script success or failure.
Chapter 17: Advanced Topics 211 When the monitor executes the testpid script, it will first determine whether the /var/run/application/pid file exists. If the file does not exist, the script exits with a non-zero exit status, which the monitor interprets as a failure. If the file does exist, the script reads the pid from the file into the variable pid. The kill command then determines whether the pid is running. The exit status of the kill command is the exit status of the script.
Chapter 17: Advanced Topics 212 Recovery script to reduce the frequency of failovers. The script could contain the following line: /etc/rc.d/init.d/myservice restart When you add a recovery script to a service or device monitor, you can set a timeout period, which is the maximum amount of time that the monitor_agent daemon will wait for the Recovery script to complete.
Chapter 17: Advanced Topics 213 exit non-zero. The service could then become active on another server, causing the Stop script to run on the original server even though the Start script had not completed successfully. When you add Start and Stop scripts to a service or device monitor, you can set a timeout period for each script. Script Environment Variables When you specify a script for a service or device monitor, Matrix Server sets the following environment variables for that script.
Chapter 17: Advanced Topics 214 MX_NAME=name The name of the device monitor. (Applies only to device monitors.) Matrix Server does not set any other variables. If a script requires a variable such as a pathname, it will need to set it. The Effect of Monitors on Virtual Host Failover Typically a virtual host has a primary network interface and one or more backup network interfaces. On the servers supplying the interfaces, the state of the virtual host is either active or inactive.
Chapter 17: Advanced Topics 215 The first example shows the state transitions that occur at startup from an unknown state. At i1, all instances of the monitor have completed stopping. At i2, the virtual host is configured on the Primary. At i3, the monitor start script begins on the Primary and probing begins on the backups. At i4, probing begins on the Primary.
Chapter 17: Advanced Topics 216 At i5 in the following example, the probe fails on the Primary. At i6, the virtual host is deconfigured on the Primary. At i7, the monitor stop script begins on the Primary. At i8, the virtual host is configured on the second backup. At i9, the monitor start script begins on the second backup. At i10, probing begins on the second backup.
Chapter 17: Advanced Topics 217 A custom device monitor also has an activity status on each server. This status indicates the current activity of the monitor on the server. The status can be one of the following: Starting, Active, Suspended, Stopping, Inactive, Failure. If it is necessary to fail over a virtual host associated with the device monitor, Matrix Server looks for a server that meets both of these conditions: the device monitor is active, and the device monitor probe reports an Up status.
Chapter 17: Advanced Topics 218 Time Primary t1 active Vhost status inactive Service probe status unknown Service monitor activity active undefined star ting Device probe status unknown Device monitor activity active undefined star ting up inactive down inactive stopping up First Bac kup Vhost status inactive Service probe status unknown Service monitor activity undefined up inactive stopping Device probe status Device monitor activity Sec ond Bac kup Vhost status unknown up un
Chapter 17: Advanced Topics 219 Integrate Custom Applications There are many ways to integrate custom applications with Matrix Server: • Use service monitors or device monitors to monitor the application • Use a predefined monitor or your own user-defined monitor • Use Start, Stop, and Recovery scripts Following are some examples of these strategies.
Chapter 17: Advanced Topics 220 Built-In Monitor or User-Defined Monitor? To decide whether to use a built-in monitor or a user-defined monitor, first determine whether a built-in monitor is available for the service you want to monitor and then consider the degree of content verification that you need.
Chapter 17: Advanced Topics 221 and then create a CUSTOM service monitor, specifying the path of the script as the “user probe script” parameter. This provides not only verification of the connection, but a degree of content verification. The CUSTOM monitor can also include Start and Stop scripts. Suppose the myservice application caches transactions induced by requests from external users for later commitment to a back-end database server.
18 SAN Maintenance The following information and procedures apply to SANs used with PolyServe Matrix Server. Server Access to the SAN When a server is either added to the matrix or rebooted, Matrix Server needs to take some administrative actions to make the server a full member of the matrix with access to the shared filesystems on the SAN. During this time, the Management Console reports the message “Joining matrix” for the server.
Chapter 18: SAN Maintenance 223 The Management Console typically displays an alert message when a server loses access to the SAN. (See Appendix B for more information about these messages.) Membership Partitions Matrix Server uses a set of membership partitions to control access to the SAN and to store the device naming database, which includes the global device names for SAN disks imported into the matrix. Typically, the membership partitions are created when you install Matrix Server.
Chapter 18: SAN Maintenance 224 Following is some sample output. The command was issued on host 99.10.30.3. The SDMP administrator is the administrator for the matrix to which the host belongs. There are three membership partitions. # mxsanlk This host: 99.10.30.3 This host’s SDMP administrator: 99.10.30.
Chapter 18: SAN Maintenance 225 • trying to lock, not yet committed by owner The SANlock is either not held or has not yet been committed by its holder. The host on which mxsanlk was run is trying to acquire the SANlock. • unlocked, trying to lock The SANlock does not appear to be held. The host on which mxsanlk was run is trying to acquire the SANlock. • unlocked The SANlock does not appear to be held. If a host holds the SANlock, it has not yet committed its hold.
Chapter 18: SAN Maintenance 226 • locked (lock is corrupt, will repair) The host on which mxsanlk was run holds the lock. The SANlock was corrupted but will be repaired. If a membership partition cannot be accessed, use the mxmpconf program to correct the problem. When you invoke mxsanlk, it checks for the Storage Device Monitor Pulse (SDMP) daemon. This daemon is responsible for grabbing and maintaining the locks on the membership partitions.
Chapter 18: SAN Maintenance 227 The mxmpconf utility starts an ASCII interface that you can use to create a new set of membership partitions or to repair the existing partitions. NOTE: Matrix Server cannot be running when you use mxmpconf. To stop the matrix, use the following command: # /etc/init.d/pmxs stop After stopping Matrix Server, type mxmpconf at the operating system prompt. The Main Menu is then displayed.
Chapter 18: SAN Maintenance 228 Maintain Membership Partitions with the Repair Option The Repair Menu allows you to view the membership partition configuration and to perform several maintenance activities. The Repair Menu lists the current membership partitions according to the membership file maintained on the server where you are running the utility. Each server in the matrix has a membership partition file, which is called the “local MP list.
Chapter 18: SAN Maintenance 229 If the status is NOT FOUND or INACCESSIBLE, there may be a problem with the disk or with another SAN component. When the problem is repaired, the status should return to OK. If the status is CORRUPT, you should resilver the partition. This step copies the membership data from a valid membership partition to the corrupted partition. NOTE: The membership partition may have become corrupt because it was used by another application.
Chapter 18: SAN Maintenance 230 created later on must be at least 2 GB in size. The minimum size for a membership partition is 100 MB. Export Configuration Changes When you change the membership partition configuration with mxmpconf, it updates the membership list on the local server. It also updates the lists on the disks containing the membership partitions specified in the local MP file.
Chapter 18: SAN Maintenance 231 Search the SAN for Membership Partitions. The Search option searches the SAN for all partitions that appear to be membership partitions. You can also copy this data to a file. Copyright © 1999-2006 PolyServe, Inc. All rights reserved.
Chapter 18: SAN Maintenance 232 The output includes each membership partition found by the search, whether the partition is active or inactive, the membership list on the disk containing the partition, and the database records for the partitions. Resilver Membership Partitions. Typically, Matrix Server writes data to one membership partition and then copies, or resilvers, that data to the other membership partitions.
Chapter 18: SAN Maintenance 233 To resilver from a partition that is not in the local MP list, select Display all, which shows all disks in the SAN. When you select a disk, the partitions on that disk are displayed. To select a partition, move to that partition and press the spacebar. You can use the Search option on the Repair menu to locate a valid membership partition. The resilver operation synchronizes all other membership partitions and the local membership partition list.
Chapter 18: SAN Maintenance 234 After you select the partition to be removed, you will be asked to select the SAN disk containing the replacement partition. The partitions on that disk are then displayed. To select a partition, move to that partition and press the spacebar. When you choose the new partition, the local path to that partition will appear at the bottom of the window. Select Done to complete the operation. Add a Membership Partition.
Chapter 18: SAN Maintenance 235 The Add option asks you to select the SAN disk containing the new partition. The partitions on that disk are then displayed. To select a partition, move to that partition and press the spacebar. (The minimum size for a membership partition is 100 MB.) The local path to that partition then appears at the bottom of the window. Select Done to complete the operation.
Chapter 18: SAN Maintenance 236 Clear the Host Registry. This option removes all entries from the server registry. It should be used only under the direction of PolyServe Technical Support. CAUTION: Before clearing the server registry, be sure to reboot or power off any servers that were previously removed from the matrix and no longer had access to the SAN. After the servers have been rebooted, they can safely access the SAN.
Chapter 18: SAN Maintenance 237 When you select a disk, the partitions on that disk are displayed. To select a partition, move to that partition and press the spacebar. Information about the partition then appears at the bottom of the window. An 8-MB partition is adequate. The partition you selected is displayed on the Membership Partition Setup window. If you want to use three membership partitions, repeat this procedure to select the additional membership partitions.
Chapter 18: SAN Maintenance 238 Increase the Membership Partition Timeout Under heavy I/O load, I/O timeouts can occur on membership partition accesses, which can cause excessive I/O path switching. If you experience this problem, increasing the membership partition timeout may resolve the issue. Before setting the timeout, be sure to stop Matrix Server. To increase the timeout, edit the file /etc/opt/polyserve/mxinit.conf.
Chapter 18: SAN Maintenance 239 3. Clear the host registry on the matrix. On one server, issue the command mxmpconf at the operating system prompt. Then select “Repair” from the Main Menu. On the Repair Menu, select the option to clear the host registry. 4. Run mxconfig from one server in the matrix. Skip the opening windows and then configure the appropriate fencing method. 5.
Chapter 18: SAN Maintenance 240 Server Cannot Be Located If the matrix reports that it cannot locate a server on the SAN but you know that the server is connected, there may be an FC switch problem. On a Brocade FC switch, log into the switch and verify that all F-Port and L-Port IDs specified in switchshow also appear in the local nameserver, nsshow. If the lists of ports are different, reboot the switch. If the reboot does not clear the problem, there may be a problem with the switch.
Chapter 18: SAN Maintenance 241 The following example shows the operation of the command: $ mx server markdown 99.10.20.4 This utility is used to verify that a server is down in the event that it cannot be fenced and cannot be rebooted. IMPORTANT: This utility must be run only after the server has been physically verified to be down. If the server is not down, running this utility could result in filesystem corruption. Do you wish to continue? y SUCCESS 99.10.20.4 has been marked as down.
Chapter 18: SAN Maintenance 242 Online Insertion of New Storage Matrix Server supports online insertion (OLI) of new storage, provided that OLI support is present for your combination of storage device, SAN fabric, HBA vendor-supplied device driver, and the associated HBA vendor-supplied libhbaapi. (Check with your vendors to determine whether OLI is supported.) When this lower-level OLI support is in place, inserting a new disk will cause a new device to automatically become eligible for importing.
Chapter 18: SAN Maintenance 243 If these conditions are not met, you will not be able to perform online replacement of the switch. Instead, you will need to stop the matrix, replace the switch, and use mxconfig to reconfigure the new switch into the matrix. Consult your switch documentation for the appropriate replacement procedure, keeping in mind that the above requirements must be met. However, if this documentation is not available, the following procedures describe a method to replace a switch.
Chapter 18: SAN Maintenance 244 12. Configure the new switch. If you saved the original configuration with the configUpload command, use the configDownload command to restore it. Otherwise, use the configure command. (You may need to consult your site’s SAN administrator or your Brocade representative for the correct configuration information.) 13. Connect the FC connectors to the new switch. Be sure to plug them into the same ports as on the original switch. 14.
Chapter 18: SAN Maintenance 245 2. If possible, save configuration information from the original switch. Some items such as the zone configuration are not needed and are just insurance against further failures. Other items such as the IP address are available elsewhere but might conveniently be captured here. One way to record the information is to capture the output of a CLI session. The following commands show types of data that might be useful: show ip ethernet for the IP address.
Chapter 18: SAN Maintenance 246 10. Verify that I/O operations are successful via the new switch. Mount a psd device, and then use mxmpio to set its active path to one of the paths that goes through the new switch. Then perform I/O operations such as creating or deleting files on the mounted psd device. Copyright © 1999-2006 PolyServe, Inc. All rights reserved.
19 Other Matrix Maintenance Although Matrix Server requires little special maintenance beyond that which is normally required for your servers and services, you may need to perform the following activities: • Maintain log files • Collect log files for analysis by PolyServe Technical Support • Disable a server for maintenance • Troubleshoot a matrix • Troubleshoot service and device monitors Maintain Log Files Matrix Server stores its log files in the /var/log/polyserve directory on each server in the matri
Chapter 19: Other Matrix Maintenance 248 The messages in the matrix.log file are either local or global. Local messages appear only in the matrix.log file on the server where the message originated. Global messages are distributed to each server in the matrix and are written into the matrix.log file on each server. You can use the Management Console to view or maintain the matrix.log file on each server. The changes you make affect only the log file for the server selected on the Servers window.
Chapter 19: Other Matrix Maintenance 249 Rotate the Matrix Log File Matrix Server rotates the matrix log file on a regular basis. Up to five old versions of the file are saved in the /var/log/polyserve directory; the saved files are named matrix.log.1, matrix.log.2, and so on. You can also rotate the matrix log file from the Management Console. Select the appropriate server on the Servers window, right-click, and select Rotate Log.
Chapter 19: Other Matrix Maintenance 250 The message appears like this in the log file: Server Level Date/time Facility Entity Message 192.168.0.1 [Info] [2001-10-07 14:16:27] User USER2 hello, world Matrix Alerts The Alerts section at the bottom of the Management Console window lists errors that have occurred in matrix operations. Double click the alert (in the Location column) to view the error in the matrix tree structure. The mx alert status command can also be used to display the current alerts.
Chapter 19: Other Matrix Maintenance 251 [root@venus1 /]# cd /tmp [root@venus1 tmp]# /opt/polyserve/tools/mxcollect This utility should only be run from a server with PolyServe Matrix Server installed. Collecting configuration data mxcollect uses ssh to connect to each node in the matrix. As a result, you will be prompted for the root password on each node and for the login information for your FibreChannel switch. Following is an example. Executing log collection on server: 99.12.4.21 root@99.12.4.
Chapter 19: Other Matrix Maintenance [root@venus1 tmp]# ls 7659 matrixinfo hsperfdata_root matrixserver_info.tgz ksocket-root _4E123592 [root@venus1 tmp]# 252 _4E123593 _4E124578 _4E124608 orbit-root ssh-nvpm6364 Upload mxcollect Files to PolyServe Technical Support After running mxcollect, you can upload the resulting files to PolyServe Technical Support. The ftp account is at ftp.polyserve.com.
Chapter 19: Other Matrix Maintenance 253 2. If you want the virtual host to remain on the backup network interface after the original server is returned to operation, make that network interface the primary network interface. (Choose the virtual host from the Virtual Hosts window, right-click, and select Properties.) 3. Perform the necessary maintenance on the original server and then reenable it. Detection of Down Servers The ClusterPulse daemon uses heartbeats to determine whether a server is up.
Chapter 19: Other Matrix Maintenance 254 Troubleshoot Matrix Problems The following situations do not produce specific error messages. The Server Status Is “Down” If a server is running but Matrix Server shows it as down, follow these diagnostic steps: 1. Verify that the server is connected to the network. 2. Verify that the network devices and interfaces are properly configured on the server. 3. Ensure that the ClusterPulse daemon is running on the server. 4.
Chapter 19: Other Matrix Maintenance 255 Matrix Server Exits Immediately If the ClusterPulse daemon exits immediately on starting, check the last lines of the following files for errors: • /var/log/polyserve/matrix.log • /var/log/polyserve/mxinit.log This problem typically occurs because either the hostname is not set properly on the server or the main Ethernet interface is not installed. Refer to the ifconfig man page for ways to check this.
Chapter 19: Other Matrix Maintenance 256 monitor was not found, the HTTP service monitor will be reported as Down. “Undefined” Status If the probe has not completed because of a script configuration problem or because Matrix Server is still attempting to finish the first probe, the status will be reported as “undefined” instead of Down. “SYSTEM ERROR” Status The “SYSTEM ERROR” status indicates that a serious system functional error occurred while Matrix Server was trying to probe the service.
Chapter 19: Other Matrix Maintenance 257 START_TIMEOUT. A Start script was executed but it did not complete within the specified timeout period. STOP_TIMEOUT. A Stop script was executed but it did not complete within the specified timeout period. RECOV_TIMEOUT. A Recovery script was executed but it did not complete within the specified timeout period. START_FAILURE. A Start script was executed but it returned a non-zero exit status. STOP_FAILURE.
Chapter 19: Other Matrix Maintenance 258 Because the error is server-specific, you must clear it on each server in the matrix (just as you had to correct the script on each server that reported a problem). NOTE: An error on a monitor may still be indicated after correcting the problem with the Start, Stop, Recovery, or probe script. Errors can be cleared only with the Management Console or the appropriate mx command. An error will not be automatically cleared by the ClusterPulse daemon.
Chapter 19: Other Matrix Maintenance 259 “Activity Unknown” Status For a brief period while the monitor_agent daemon checks the monitor script configuration and creates a thread to serve the monitor, the activity may be displayed as “activity unknown.” “Transitioning” Activity The “Transitioning” activity indicates that the monitor state is on its way to becoming ACTIVE or INACTIVE (or starting or stopping, if a Start or Stop script is present).
Chapter 19: Other Matrix Maintenance 260 service or device to fail periodically and you do not want to take the failover action for a single probe failure. Putting a script like this in place essentially implements a “two consecutive probe-script failure” probe. Matrix Server Network Port Numbers Matrix Server uses a set of network ports for external connections, such as connections from the Management Console and the mx utility.
Chapter 19: Other Matrix Maintenance Port 261 Transport Type Description 7659 TCP Group Communications client connections 7659 UDP Group Communications multicast and unicast messages 7660 UDP Group Communications control token 7661 UDP Group Communications administration and statistics 8940 UDP PanPulse network health detector 9060 TCP DLM control and statistics connections 9060 UDP DLM point-to-point messages 9065 TCP MSM control and statistics connections 9065 UDP MSM point-t
A Management Console Icons The Management Console uses the following icons. Matrix Server Entities The following icons represent the Matrix Server entities. If an entity is disabled, the color of the icon becomes less intense. Copyright © 1999-2006 PolyServe, Inc. All rights reserved.
Appendix A: Management Console Icons 263 Additional icons are added to the entity icon to indicate the status of the entity. The following example shows the status icons for the server entity. The status icons are the same for all entities and have the following meanings. Monitor Probe Status The following icons indicate the status of service monitor and device monitor probes. If the monitor is disabled, the color of the icons is less intense. Copyright © 1999-2006 PolyServe, Inc. All rights reserved.
Appendix A: Management Console Icons 264 On the Applications tab, virtual hosts and single-active monitors use the following icons to indicate the primary and backups. Multi-active monitors use the same icons but do not include the primary or backup indication. Management Console Alerts The Management Console uses the following icons to indicate the severity of the messages that appear in the Alert window. Copyright © 1999-2006 PolyServe, Inc. All rights reserved.
B Error and Log File Messages When certain errors occur, Matrix Server writes messages to the Management Console. Other error messages are written to the server’s log file (matrix.log). Management Console Alert Messages NN.NN.NN.NN has lost a significant portion of its SAN access, possibly due to a SAN hardware failure The specified server is unable to write to any of the membership partitions. Ensure that the server can access the membership partitions and also has write access to them.
Appendix B: Error and Log File Messages 266 NN.NN.NN.NN should be rebooted ASAP as it stopped matrix network communication DATE HH:MM:SS and was excluded from the SAN to protect filesystem integrity The server was excluded from the matrix because it could no longer communicate over the network. The server should be rebooted at the first opportunity. Also check the network and make sure that the server is not experiencing a resource shortage. NN.NN.NN.
Appendix B: Error and Log File Messages 267 Error connecting to server: I/O error Error connecting to server An error occurred when Matrix Server tried to connect to the specified server through the Connect to Matrix window. Error getting cluster status from server: The describes the error. The connection to the server on port 9050 was successful but the first response from the server experienced an I/O error.
Appendix B: Error and Log File Messages 268 Internal error: unable to initialize security Internal error: unable to initialize security. Java program problem. Contact service. If you receive one of these messages, report it to PolyServe Technical Support at your earliest opportunity. Majority of membership partitions are unwritable, possibly due to a SAN or storage hardware failure. As a result, disk imports and deports cannot be done, and some servers may be unable to mount filesystems.
Appendix B: Error and Log File Messages 269 Matrix unable to take control of SAN, because another matrix that includes NN.NN.NN.NN currently controls the SAN. Possibly a networking failure or misconfiguration has partitioned these servers from the servers that control the SAN, or possibly this matrix has been misconfigured to share membership partitions with another matrix. Check the matrix configuration and add the server if it is not currently a member.
Appendix B: Error and Log File Messages 270 Matrix unable to take control of SAN. As a result, some or all filesystem operations may be paused throughout the matrix. In addition, filesystem mounts and unmounts and disk imports and deports cannot be performed. Verify that this message, not one of the previous messages, is displayed. Also verify that the servers can access the membership partitions and have write access to them, and that the servers can communicate with the FibreChannel switch.
Appendix B: Error and Log File Messages 271 psdNpN on NN.NN.NN.NN is stalled on locks from NN.NN.NN.NN A DLM lock request has been outstanding for a long period of time on the specified server. Probably the server is severely overloaded or is experiencing a resource shortage. As a last resort, reboot the server to clear the problem.
Appendix B: Error and Log File Messages 272 Singleton matrix unable to take control of SAN, because the matrix that includes NN.NN.NN.NN currently controls the SAN. Possibly this server has not been added to the matrix or has been deleted from the matrix, or possibly a networking failure or misconfiguration has partitioned this server from the servers that control the SAN. Check the matrix configuration and add the server if it is not currently a member.
Appendix B: Error and Log File Messages 273 ClusterPulse Messages Bad command -- Could not find device monitor instance for XXX on server YYY The monitor_agent daemon is reporting status on a device monitor with device name XXX on server YYY but the ClusterPulse daemon does not recognize this device. Probably the Management Console has removed the device monitor and monitor_agent has already sent the status to ClusterPulse. Therefore, no corrective action is required.
Appendix B: Error and Log File Messages 274 Internal system error -- Internal error at server X.X.X.X: select returned with an unknown read socket N Internal system error -- Internal error at server X.X.X.X: select returned with an unknown write socket N Internal system error -- Internal select error at server X.X.X.X: [select ?] with errno of N The ClusterPulse daemon received a system error. Report this error to PolyServe Technical Support at your earliest opportunity.
Appendix B: Error and Log File Messages 275 Monitor error -- monitor_agent reported N:: The monitor_agent daemon experienced an error and is copying the error string to the matrix.log file. Inspect the error string for details about resolving the error. Network error -- set_readable called with unknown socket N Network error -- set_writeable called with unknown socket N If you receive this message, notify PolyServe Technical Support at your earliest convenience.
Appendix B: Error and Log File Messages 276 Script error -- Write to monitor failed: for agent monitor_agent. Shutting down agent. Script error -- Write to monitor failed: . This probably means the agent has crashed for agent monitor_agent. Shutting down agent. The ClusterPulse daemon experienced an error while trying to write to the monitor_agent daemon. It will attempt to recover from this failure.
Appendix B: Error and Log File Messages 277 Write error - in default_write_fun: Unknown connection mode for IP %s port %d Read error - in default_read_fun: Unknown connection mode for IP %s port %d If you receive either of these messages, notify PolyServe Technical Support at your earliest convenience. PSFS Filesystem Messages If you receive a panic message from the PSFS filesystem, report it to PolyServe Technical Support at your earliest convenience.
Appendix B: Error and Log File Messages 278 Fatal messages have this format: [Fatal ] [] SANPulse SERVERS A fatal message indicates that Matrix Server has terminated on the specified server. First attempt to restart Matrix Server on that server. If the matrix software cannot be restarted, you will see another message asking you to reboot the server.
Appendix B: Error and Log File Messages 279 Interface has gone down after an attempt to assign an outbound multicast interface: Interface has been marked down because its interface flags do not include IFF_UP If PanPulse determines that the interfaces are down or unavailable on another server, it will report the following: No interfaces are responding on host
Appendix B: Error and Log File Messages 280 panpulse will not provide service until the required interface flags (UP, RUNNING, BROADCAST, MULTICAST) are set on interface . See ifconfig(8). IPv4 Only IPv4 is supported. If another address family is specified for the network, PanPulse will report an error such as the following: Interface has addrtype AF_INET6, skipping Separate Networks Each network interface card must be on a separate network.
Appendix B: Error and Log File Messages 281 Fence Agent Error Messages Fence Agent error messages are typically caused by operational problems with a FibreChannel switch, a loss of network access, or an unresponsive switch. For Cisco MDS FibreChannel switches, errors can appear if the default VSAN is disabled. Operational Problems When fabric fencing is configured on the matrix, errors such as the following can appear in the Alert panel on the Management Console.
Appendix B: Error and Log File Messages 282 The following utility is useful for determining whether a matrix node can communicate with the FibreChannel switch. The command displays the name service entries of the WWPN and the Fabric IDs of the switch. It may take approximately five or six seconds for the output to be displayed.
Index A AUTOFAILBACK, virtual host 158 administrative network defined 3 failover 52 network topology 50 requirements for 49 select 50 administrative traffic 55 alerts Alerts pane on Management Console 30 display error on Management Console 30 display on command line 250 icons shown on Management Console 264 applications create 137 filter 141 manage 142 name of 137 status 139 Applications tab drag and drop operations 142 filter applications 141 format 141 icons 139 manage application monitors 146 manage ap
Index 284 device 176 service 163 custom scripts 209 configuration limits 73 convert from basic 83 create 74 defined 72 destroy 81 extend 79 guidelines 74 list subdevices with sandiskinfo 64 names 73 properties 77 recreate 81 striped defined 73 stripe state 78 stripeset 79 subdevices 72 D design guidelines 11 device database defined 7 membership partitions 57 device monitor activeness policy 176 activity status 258 CUSTOM monitor 176 defined 9 DISK monitor 176 events 190 Global Event Delay 188 multi-acti
Index 285 test 198 virtual host 148 virtual host activeness policy 155 FC switch cannot locate server 240 online replacement 242 Fence Agent error messages 281 fencing cannot fence server 240 change method of 238 Fence Agent error messages 281 third-party MPIO 70 filesystem, PSFS access 85 atime updates 110 backup 112 check with psfsck 109 crash recovery 88 create 90 create with mkpsfs command 94 create with mx command 94 destroy 114 extend 105 features 85 features, configured 106 journal 86 mount 95 moun
Index 286 active 229 add 234 defined 57 display 230 inactivate 235 inactive 229 increase timeout 238 remove 233 repair 228 replace 233 resilver 232 memory, server 11 mkpsfs command 94 multipath I/O defined 12 example 15 manage with mxmpio 65 QLogic driver, enable failover 69 third-party 70 mx commands 24 mx server markdown command 240 mxcollect utility 250 mxinit utility error messages 280 process monitoring 32 start or stop software 34 mxlogd daemon 6 mxlogger command 249 mxmpconf utility 226 mxmpio comm
Index 287 ports, network configure 261 external 260 internal 260 primary server 8 probe scripts 210 probe severity, failover 159 processes monitor 32 psd driver 5 PSFS filesystem.
Index 288 add or update 164 advanced settings probe severity 166 scripts 169 delete 172 disable 172 enable 172 service monitor types custom 163 FTP 162 HTTP 162 NFS 162 TCP 163 services file 261 shared disks import 58 SHARED_FILESYSTEM device monitor 175 SMDS (Shared Memory Data Store) 7 snapclone 133 snapshots create from command line 135 create from Management Console 134 defined 133 delete 135 errors 135 mount 136 mount after creating 134 support for 133 unmount 136 Start scripts device monitor 182 ser