PolyServe® Administration Guide PolyServe Matrix Server 3.5 File Serving Utility™ 3.5 Database Utility™ 3.5 for Red Hat Enterprise Linux AS/ES 4.
Copyright © 2004-2007 PolyServe, Inc. Use, reproduction and distribution of this document and the software it describes are subject to the terms of the software license agreement distributed with the product (“License Agreement”). Any use, reproduction, or distribution of this document or the described software not explicitly permitted pursuant to the License Agreement is strictly prohibited unless prior written permission from PolyServe has been received.
Contents HP Technical Support HP Storage Website . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xiv HP NAS Services Website . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xiv 1 Introduction Product Features. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 The Structure of a Matrix . . . . . . . . . . . .
Contents iv MxFS-Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dynamic Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Matrix Management Applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . Manage a Matrix with the Management Console . . . . . . . . . . . . Authentication Parameters and Bookmarks. . . . . . . . . . . . . . . . . Manage Bookmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Contents v Change the IP Address for a Server. . . . . . . . . . . . . . . . . . . . . . . . Matrix Server License File. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Upgrade the License File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Supported Matrix Server Features . . . . . . . . . . . . . . . . . . . . . . . . . Migrate Existing Servers to Matrix Server . . . . . . . . . . . . . . . . . . Configure Servers for DNS Load Balancing. . . . . . . . . . . . . . .
Contents vi An Example of Changing the I/O Path . . . . . . . . . . . . . . . . . . . . . Display Status Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Display MPIO Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Set the Timeout Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Other MPIO Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Contents vii Unmount from the Management Console . . . . . . . . . . . . . . . . . Unmount from the Command Line. . . . . . . . . . . . . . . . . . . . . . . Persistent Mounts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Persistent Mounts on a Server . . . . . . . . . . . . . . . . . . . . . . . . . . . Persistent Mounts for a Filesystem . . . . . . . . . . . . . . . . . . . . . . . View or Change Filesystem Properties . . . . . . . . . . . . . . . . . . . . . . .
Contents viii Other Export Group Procedures . . . . . . . . . . . . . . . . . . . . . . . . . Configure the Global NFS Probe Settings . . . . . . . . . . . . . . . . . . . . Configure Virtual NFS Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sample Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Guidelines for Creating Virtual NFS Services . . . . . . . . . . . . . . Add a Virtual NFS Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Contents ix Create a Snapshot or Snapclone. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Errors During Snapshot Operations . . . . . . . . . . . . . . . . . . . . . . Delete a Snapshot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mount or Unmount Snapshots. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 191 191 191 11 Matrix Operations on the Applications Tab Applications Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Contents x Delete a Service Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Disable a Service Monitor on a Specific Server . . . . . . . . . . . . . Enable a Previously Disabled Service Monitor . . . . . . . . . . . . . Remove Service Monitor from a Server . . . . . . . . . . . . . . . . . . . View Service Monitor Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . Clear Service Monitor Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Contents xi 16 Performance Monitoring View the Performance Dashboard . . . . . . . . . . . . . . . . . . . . . . . . . . . Display the Dashboard for All Servers . . . . . . . . . . . . . . . . . . . . View Counter Details. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Display the Dashboard for One Server . . . . . . . . . . . . . . . . . . . . File Serving Performance Dashboard . . . . . . . . . . . . . . . . . . . . . . . . Start the Dashboard from the Command Line . . . . . . . . .
Contents xii FibreChannel Switches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Changes to Switch Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Add a New FibreChannel Switch . . . . . . . . . . . . . . . . . . . . . . . . Online Replacement of a FibreChannel Switch . . . . . . . . . . . . . 310 310 310 311 19 Other Matrix Maintenance Maintain Log Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The matrix.log File . . .
Contents xiii Network Interface Requirements Are Not Met . . . . . . . . . . . . . mxinit Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fence Agent Error Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Operational Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Loss of Network Access or Unresponsive Switch . . . . . . . . . . . Default VSAN Is Disabled on Cisco MDS FC Switch . . . . . . . .
HP Technical Support Telephone numbers for worldwide technical support are listed on the following HP website: http://www.hp.com/support. From this website, select the country of origin. For example, the North American technical support number is 800-633-3600. NOTE: For continuous quality improvement, calls may be recorded or monitored.
HP Technical Support xv service partners. For more information see us at http://www.hp.com/hps/storage/ns_nas.html.For the latest documentation, go tohttp://www.hp.com/support/manuals.
1 Introduction Matrix Server provides a matrix structure for managing a group of network servers and a Storage Area Network (SAN) as a single entity. It includes a cluster filesystem for accessing shared data stored on a SAN and provides failover capabilities for applications. The Matrix Volume Manager can be used to create and manage dynamic volumes, which allow large filesystems to span multiple disks, LUNs, or storage arrays.
Chapter 1: Introduction 2 on a SAN. After a PSFS filesystem has been created on a SAN disk, all servers in the matrix can mount the filesystem and subsequently perform concurrent read and/or write operations to that filesystem. PSFS is a journaling filesystem and provides online crash recovery. • Availability and reliability.
Chapter 1: Introduction 3 same performance) than a single server. A price advantage is gained by using commodity-level Intel-based servers instead of larger SMP (4-way or 8-way) servers or larger proprietary filers to accommodate scaling client connectivity demands. • Scalable NFS performance. With multiple NFS servers serving the same filesystems and with appropriate client balancing among the servers, MxFS-Linux supports linearly increasing NFS performance.
Chapter 1: Introduction 4 • Group migration of NFS clients. The administrator can gracefully migrate an NFS service and the associated NFS clients from one server to another for maintenance or load-balancing purposes, without downtime or failure of any NFS client. • Cluster-wide consistent user authentication.
Chapter 1: Introduction 5 Overview The Structure of a Matrix A matrix includes the following physical components. Internet Public LANs Administrative Network (LAN) FC Switch RAID Subsystem RAID Subsystem Servers. Each server must be running Matrix Server. Public LANs. A matrix can include up to four network interfaces per server.
Chapter 1: Introduction 6 networks be isolated from the networks used by external clients to access the matrix. Storage Area Network (SAN). The SAN includes FibreChannel switches and RAID subsystems. Disks in a RAID subsystem are imported into the matrix and managed from there. After a disk is imported, you can create PSFS filesystems on it. Software Components The Matrix Server software is installed on each server in the matrix.
Chapter 1: Introduction 7 mxperfd. Gathers performance data from files in the /proc filesystem for use by various system components. PanPulse. Selects and monitors the network to be used for the administrative network, verifies that all hosts in the matrix can communicate with each other, and detects any communication problems. pswebsvr. The embedded web server daemon used by the Management Console and the mx utility. SANPulse. Provides the matrix infrastructure for management of the SAN.
Chapter 1: Introduction 8 Matrix Volume Manager The Matrix Volume Manager can be used to create dynamic volumes consisting of disk partitions that have been imported into the matrix. Dynamic volumes can be configured to use either concatenation or striping. A single PSFS filesystem can be placed on a dynamic volume. The Matrix Volume Manager can also be used to extend a dynamic volume and the filesystem located on that volume.
Chapter 1: Introduction 9 Matrix Server Databases Matrix Server uses the following databases to store matrix information: • Shared Memory Data Store (SMDS). The SANPulse daemon stores filesystem status information in this database. The database contains cp_status and sp_status files that are located in the directory /var/opt/polyserve/run on each server. These files should not be changed. • Device database. The SCL assigns a device name to each shared disk imported into the matrix.
Chapter 1: Introduction 10 the same filesystem exports) regardless of any failover transitions, from one node to another, of the Virtual NFS Service. Virtual Hosts and Failover Protection Matrix Server uses virtual hosts to provide failover protection for servers and network applications. A virtual host is a hostname/IP address configured on one or more servers. The network interfaces selected on those servers to participate in the virtual host must be on the same subnet.
Chapter 1: Introduction 11 Matrix Server includes several built-in service monitors for monitoring well-known network services. You can also configure custom monitors for other services. A device monitor is similar to a service monitor; however, it is designed either to watch a part of a server such as a local disk drive or to monitor a PSFS filesystem. A device monitor is assigned to one or more servers. Matrix Server provides several built-in device monitors.
Chapter 1: Introduction 12 Matrix Server Multipath I/O Multipath I/O can be used in a matrix configuration to eliminate single points of failure. It supports the following: • Up to four FibreChannel ports per server. If an FC port or its connection to the fabric should fail, the server can use another FC port to reach the fabric. • Multiple FibreChannel switches. When the configuration includes more than one FC switch, the matrix can survive the loss of a switch without disconnecting servers from the SAN.
Chapter 1: Introduction 13 Single FC Port, Single FC Switch, Single FC Fabric This is the simplest configuration. Each server has a single FC port connected to an FC switch managed by the matrix. The SAN includes two RAID arrays. In this configuration, multiported SAN disks can protect against a port failure, but not a switch failure.
Chapter 1: Introduction 14 Single FC Port, Dual FC Switches, Single FC Fabric In this example, the fabric includes two FC switches managed by the matrix. Servers 1–3 are connected to the first FC switch; servers 4–6 are connected to the second switch. The matrix also includes two RAID arrays, which contain multiported disks. If a managed FC switch fails, the servers connected to the other switch will survive and access to storage will be maintained.
Chapter 1: Introduction 15 Dual FC Ports, Dual FC Switches, Single FC Fabric This example uses multipath I/O to eliminate single points of failure. The fabric includes two FC switches managed by the matrix. Each server has two FC ports; the first FC port connects to the first FC switch and the second FC port connects to the second FC switch. The matrix also includes two RAID arrays containing multiported disks.
Chapter 1: Introduction 16 Dual FC Ports, Dual FC Switches, Dual FC Fabrics This example is similar to the previous example, but also includes dual FC fabrics, with a matrix-managed FC switch in each fabric. If one of the fabrics should fail, the servers can access the storage via the other fabric. IP Connection/Network Server 1 Server 2 Administrative Network Server 3 Server 4 FC Switch FC Switch FC Fabric FC Fabric RA ID Array Server 5 RA ID Array Copyright © 1999-2007 PolyServe, Inc.
Chapter 1: Introduction 17 iSCSI Configuration This example shows an iSCSI configuration. The iSCSI initiator is installed on each server. Ideally, a separate network should be used for connections to the iSCSI storage arrays. (iSCSI configurations are supported only on SLES9 systems.) Server 1 Server 2 Server 3 Server 4 Server 5 Network Switch iSCSI Array iSCSI Array Copyright © 1999-2007 PolyServe, Inc. All rights reserved.
2 Matrix Administration PolyServe Matrix Server can be administered either with the Management Console or from the command line. Administrative Considerations You should be aware of the following when managing Matrix Server. Network Operations • Normal operation of the matrix depends on a reliable network hostname resolution service. If the hostname lookup facility becomes unreliable, this can cause reliability problems for the running matrix.
Chapter 2: Matrix Administration 19 adversely affect the entire matrix. If you need to add or delete a network interface on a particular server, first stop Matrix Server on that server. Then make your administrative change and restart Matrix Server. • A network interface should not be modified or taken down while Matrix Server is running. Attempting to do this can cause adverse effects.
Chapter 2: Matrix Administration 20 server must not be established while Matrix Server is running on the server. • Changing the hardware configuration, such as adding or replacing switch modules, is not supported while Matrix Server is running. This restriction applies only to FibreChannel switches that are under the control of Matrix Server. • The Linux kernel can access only partitions 01 through 15 on SCSI disks.
Chapter 2: Matrix Administration 21 Tested Configuration Limits The tested configuration limits for Matrix Server and MxFS-Linux are as follows. These limits will be increased as additional testing takes place. Theoretically, the tested configuration limits can be exceeded up to the bounds of the operating system. If you plan to configure your matrix beyond the tested limits, please contact PolyServe Technical Support for information about any known configuration issues or concerns.
Chapter 2: Matrix Administration 22 Dynamic Volumes The configuration limits for dynamic volumes are as follows: • A maximum size of 128 TB (64-bit OS) or 16 TB (32-bit OS) for a dynamic volume. • A maximum of 512 dynamic volumes per matrix. • A maximum of 128 subdevices per dynamic volume. • Concatenated dynamic volumes can be extended up to 128 times; however, the total number of subdevices cannot exceed 128.
Chapter 2: Matrix Administration 23 NOTE: For improved performance, the Management Console caches hostname lookups. If your DNS changes, you may need to restart the console so that it will reflect the new hostname. Start the Management Console To start the Management Console, first start the windowing environment and then type the following command: $ mxconsole The Matrix Server Connect window then appears. If the window does not display properly, verify that your DISPLAY variable is set correctly.
Chapter 2: Matrix Administration 24 User: Type the name of the user who will be accessing the matrix. The administrator is user admin. Other users have read permission only. Password: Type the user’s password. If you do not want to be prompted for the password again, click the “Remember this password” checkbox. (For the password to be saved, you will also need to create a bookmark.) Add to bookmarks: Click this checkbox to create a bookmark for this connection.
Chapter 2: Matrix Administration 25 Manage Bookmarks The Bookmarks display lists the matrix connections that are configured in the .matrixrc file. Click the Bookmarks button on the Matrix Server Connect window to display the current list of bookmarks and the available options.You can connect to any of the servers or matrices in the list. Double click on the server or matrix, or select it and then click on either Connect or Configure. The bookmark options are: • Add.
Chapter 2: Matrix Administration 26 Set Default. If you set a server as the default, Matrix Server will first attempt to use that server to connect to the matrix. If the server is not available, Matrix Server will start at the top of the list of servers and attempt to connect to them in turn until it reaches an available server. If you have several matrices in the Bookmarks list, you can set one of them to be the default for connections when the Management Console is started. • Move Up/Move Down.
Chapter 2: Matrix Administration 27 Update an Existing .matrixrc File to Use New Features If your .matrixrc file was used in a Matrix Server release earlier than 3.5.0 and has single servers configured, you will need to create a bookmark entry for the matrix in order to use the “synchronize bookmarks” feature. To do this, take one of these steps: • Click the Add button on the Matrix Server Connect window to add a new bookmark. On the Add Bookmark window, specify a server and the name of the matrix.
Chapter 2: Matrix Administration 28 Manage a Matrix from the Command Line The mx utility allows you to manage Matrix Server from the command line. See the PolyServe Command Reference for more information about mx. PSFS filesystems can also be managed with Linux shell commands. Changes made with these commands are reflected on the Management Console. The Management Console When you connect to the matrix via mxconsole, the Management Console appears.
Chapter 2: Matrix Administration 29 The toolbar at the top of the window can be used to connect or disconnect from a matrix, to add new matrix entities such as servers, monitors, or filesystems, to mount or unmount filesystems, to import or deport disks, to collapse or expand the entity lists, and to display online help. Copyright © 1999-2007 PolyServe, Inc. All rights reserved.
Chapter 2: Matrix Administration 30 Virtual Hosts Tab The Virtual Hosts tab shows all virtual hosts in the matrix. For each virtual host, the window lists the network interfaces on which the virtual host is configured, any service monitors configured on that virtual host, and any device monitors associated with that virtual host. Copyright © 1999-2007 PolyServe, Inc. All rights reserved.
Chapter 2: Matrix Administration 31 Applications Tab This view shows the Matrix Server applications, virtual hosts, service monitors, and device monitors configured in the cluster and provides the ability to manage and monitor them from a single screen. The applications, virtual hosts, and monitors appear in the rows of the table. The servers on which they are configured appear in the columns. Copyright © 1999-2007 PolyServe, Inc. All rights reserved.
Chapter 2: Matrix Administration Filesystems Tab The Filesystems tab shows all PSFS filesystems in the matrix. Copyright © 1999-2007 PolyServe, Inc. All rights reserved.
Chapter 2: Matrix Administration Notifiers Tab The Notifiers tab shows all notifiers configured in the matrix. Copyright © 1999-2007 PolyServe, Inc. All rights reserved.
Chapter 2: Matrix Administration 34 Matrix Alerts The Alerts section at the bottom of the Management Console window lists errors that have occurred in matrix operations. You can double click on an alert message to see more information about the alert: • If the complete alert message does not fit in the Description column, double-click on that column to open a window displaying all of the text. Copyright © 1999-2007 PolyServe, Inc. All rights reserved.
Chapter 2: Matrix Administration 35 • Double-click in the Location column to display the error on the Management Console. • Double-click on the Application column to display that application on the Applications tab. Add Users or Change Passwords The admin user is the matrix administrator. Other users can view the configuration but cannot make changes to it. If you need to add a new user or change an existing password on a particular server, use one of these methods: Matrix Configuration window.
Chapter 2: Matrix Administration 36 Matrix Server Processes and mxinit When Matrix Server is running on a server, the following Matrix Server processes should be active: clusterpulse The ClusterPulse daemon panpulse The PanPulse daemon sanpulse The SANPulse daemon dlm The Distributed Lock Manager daemon grpcommd The matrix-wide communications daemon mxregd The daemon that manages portions of the internal Matrix Server configuration mxeventd The distributed event subscription and notification se
Chapter 2: Matrix Administration 37 NOTE: When you invoke mxinit to start Matrix Server, by default it continues running and monitors processes. If you do not want mxinit to monitor processes, invoke it with the -M (or --no-monitor) option. It will then exit after it completes the options you specified. View Matrix Server Process Status You can use the mxinit utility or the pmxs script to display the status of Matrix Server processes and modules.
Chapter 2: Matrix Administration 38 mxinit Configuration File mxinit performs its actions according to a set of default values. You can use the /etc/opt/polyserve/mxinit.conf configuration file to override these values. The file describes the available options and the required format. We recommend that you change this file only under the direction of PolyServe personnel. Start or Stop Matrix Server Matrix Server runs on each server in the matrix.
Chapter 2: Matrix Administration 39 If you specify both --stop and --hard, the mxinit command first attempts the --stop operation. If it fails, mxinit then executes the --hard operation. • -H, --hard Perform a hard, immediate stop of the Matrix Server processes. mxinit first attempts to terminate any applications accessing PSFS filesystems. It then unmounts the filesystems, terminates the Matrix Server processes, and unloads Matrix Server modules. • -L, --load-mod Load all Matrix Server modules.
Chapter 2: Matrix Administration 40 For example, on SLES9, locate the line in the /etc/init.d/nfs script beginning with “# Required-Start:” and add pmxs to the end of the line. Change: # Required-Start: $network $portmap to: # Required-Start: $network $portmap pmxs Back Up and Restore the Configuration It is important to back up the matrix configuration whenever changes are made. You can then easily restore the matrix if necessary.
Chapter 2: Matrix Administration 41 MxFS-Linux Configuration The MxFS-Linux configuration is stored in an online database. To back up the information in the database, you will need to save the Export Group configuration and the NLM state.
Chapter 2: Matrix Administration 42 To restore the membership partition data, use the mpimport command. The following example imports the data from the default membership partition backup file and recreates the membership partitions. # /opt/polyserve/lib/mpimport -s -M -F NOTE: Matrix Server must be stopped on all nodes when mpimport is used. Database corruption can occur if the utility is executed while Matrix Server is running on a node.
Chapter 2: Matrix Administration 43 External Network Port Numbers The following port numbers are used for external connections to Matrix Server. If Matrix Server is behind a firewall, it may be necessary to change the firewall rules to allow traffic for these ports.
Chapter 2: Matrix Administration 44 Change Configurable Port Numbers If it is necessary to change one or more of the network port numbers used by Matrix Server, simply add the service line or lines that you want to change to the /etc/services file on all nodes in the matrix. (See the next section for the format of the entries.) After editing the file, either reboot each node or stop and restart Matrix Server on all nodes.
Chapter 2: Matrix Administration exportgroup(8) mx command to manage export groups (provided with MxFS-Linux) fs(8) mx command to manipulate PSFS filesystems matrix(8) mx command to manipulate a matrix matrixrc(8) The matrix configuration file mkpsfs(8) Create a PSFS filesystem mount_psfs(8) Mount a PSFS filesystem mx(8) Manipulate a matrix mxinit(8) Start, stop, or monitor Matrix Server processes mxlogger(8) Add a log message to the matrix log mxmpconf(8) Manage membership partitions m
Chapter 2: Matrix Administration service(8) mx command to manipulate service monitors sleep(8) Pause between mx commands snapshot(8) mx command to manage hardware snapshots vhost(8) mx command to manipulate virtual hosts vnfs(8) mx command to manage Virtual NFS Services (provided with MxFS-Linux) Copyright © 1999-2007 PolyServe, Inc. All rights reserved.
3 Configure Servers Before adding a server to a matrix, verify the following: • The server is connected to the SAN if it will be accessing PSFS filesystems. • The server is configured as a fully networked host supporting the services to be monitored. For example, if you want Matrix Server to provide failover protection for your Web service, the appropriate Web server software must be installed and configured on the servers. • If the /etc/hosts file has been modified, it should be consistent with the DNS.
Chapter 3: Configure Servers 48 3. Select the SAN& Fencing tab on the Configure Matrix window and then select FibreChannel switch-based fencing. In the SAN Switches section of the tab, click the Add button to configure the new switch. 4. Click Apply at the bottom of the Configure Matrix window. 5. Go to the Matrix-Wide Configuration tab, select all servers except the server to which you are connected, and then click the “Export To” button. 6. Restart Matrix Server on the existing nodes.
Chapter 3: Configure Servers 49 When a new server is added, ClusterPulse updates the SAN information on that server. During the update, the Management Console may be unable to report SAN information such as the status of PSFS filesystems. You can ignore this; the matrix is operating correctly and the SAN information will reappear within a few seconds. Configure a Server To add a new server to a matrix, select Matrix > Add > Add Server on the Management Console.
Chapter 3: Configure Servers 50 There are two settings for Server Severity: AUTORECOVER. This is the default behavior. Virtual hosts automatically fail back to network interfaces on the original server after it is restored to normal operation. NOAUTORECOVER. After the server recovers, it is not made available to host virtual hosts. Instead, Matrix Server disables the server. (You will need to re-enable the server with the Management Console or mx utility).
Chapter 3: Configure Servers 51 Delete a Server Select the server to be deleted from the Servers window on the Management Console, right-click, and select Delete. To delete servers from the command line, use this command: mx server delete ... Disable a Server Select the server to be disabled from the Servers window on the Management Console, right-click, and select Disable. When you disable a server, any active virtual hosts and device monitors on the server will become inactive.
Chapter 3: Configure Servers 52 3. Start Matrix Server on server S2a. The server joins the matrix, which now consists of servers S1, S2, S3, and S2a. Server S2 is down and S1, S2a, and S3 are up. 4. Delete server S2 from the matrix. This step will remove references to the server. 5. Update virtual hosts and any other matrix entities that used server S2 to now include S2a. Matrix Server License File To operate properly, Matrix Server requires that a license file be installed on each server in the matrix.
Chapter 3: Configure Servers 53 for a particular server. Select the server on the Servers window, rightclick, and select View Features. The Supported Features window lists all of the Matrix Server features. Any features not supported by your license are greyed out. Migrate Existing Servers to Matrix Server In Matrix Server, the names of your servers should be different from the names of the virtual hosts they support. A virtual host can then respond regardless of the state of any one of the servers.
Chapter 3: Configure Servers 54 virtual host—basically, a failover-protected version of the server, with no difference in appearance to the clients. • Keep the existing name on the server. If you do not rename the server, clients will need to use the new virtual host name to benefit from failover protection. Clients can still access the server by its name, but those requests are not protected by Matrix Server.
Chapter 3: Configure Servers 55 acmd1 acmd2 Primary: virtual_acmd1 Primary: virtual_acmd2 Backup: virtual_acmd2 Backup: virtual_amcd1 All virtual host traffic The addresses on the domain name server are virtual_acmd1 and virtual_acmd2. Two virtual hosts have also been created with those names. The first virtual host uses acmd1 as the primary server and acmd2 as the backup. The second virtual host uses acmd2 as the primary and acmd1 as the backup.
Chapter 3: Configure Servers 56 IP address: The IP addresses for the virtual hosts you will use for each server in the matrix. These are the IP addresses that the DNS will use to send alternate requests. (In this example, virtual host virtual_acmd1 uses IP address 10.1.1.1 and virtual host virtual_acmd2 uses IP address 10.1.1.2.) With this setup, the domain name server sends messages in a roundrobin fashion to the two virtual hosts indicated by the IP addresses, causing them to share the request load.
4 Configure Network Interfaces When you add a server to the matrix, Matrix Server determines whether each network interface on that server meets the following conditions: • The network interface is up and running. • Broadcast and multicast are enabled on the network interface. • Each network interface card (NIC) is on a separate network. Network interfaces meeting these conditions are automatically configured into the matrix.
Chapter 4: Configure Network Interfaces 58 When Matrix Server is started, the PanPulse daemon selects the administrative network from the available networks. When a new server joins the matrix, the PanPulse daemon on that server tries to use the established administrative network. If it cannot use that network, the PanPulse daemon on the new server will look for another network that all of the servers can use.
Chapter 4: Configure Network Interfaces 59 Matrix Server will fail over the traffic to an interface that discourages administrative traffic. Virtual Hosts A virtual host is created on a set of network interfaces. These network interfaces must be enabled for virtual hosting. By default, all network interfaces are enabled; however, you can disable a network interface if you do not want it to carry virtual host traffic.
Chapter 4: Configure Network Interfaces 60 Administrative Network Failover An administrative network failure occurs when the interface on a particular server is no longer receiving Matrix Server administrative traffic. Some possible causes of the failure are a bad cable or network interface card (NIC). When the administrative network fails on a server, the PanPulse daemon on that server attempts to select another network to act as the administrative network.
Chapter 4: Configure Network Interfaces 61 do this can cause adverse effects. If any part of a network interface definition needs to be changed, stop Matrix Server on the affected server, shut down the network, and then make the change. When the change is complete, bring up the network and restart Matrix Server on the node. Add or Modify a Network Interface When you add a server to the matrix, its network interfaces are automatically configured into the matrix.
Chapter 4: Configure Network Interfaces 62 Server: The name or IP address of the server that will include the new network interface. IP: Type the IP address for the network interface. Netmask: Type the net mask for the network interface. Allow Admin. Traffic: Specify whether the network interface can host administrative traffic. The default is to allow the traffic.
Chapter 4: Configure Network Interfaces 63 mx netif admin ... Use the following command to discourage administrative traffic: mx netif noadmin ... When you configure a network interface to allow or discourage administrative traffic, the setting applies to all interfaces within the same subnet on all servers of the cluster. Enable or Disable a Network Interface for Virtual Hosting By default, all network interfaces are enabled for virtual hosting.
5 Configure the SAN SAN configuration includes the following: • Import or deport SAN disks. After a disk is imported, it can be used for PSFS filesystems. • Change the partitioning on SAN disks. • Display information about SAN disks. • Manage multipath I/O. Overview SAN Configuration Requirements Be sure that your SAN configuration meets the requirements specified in the PolyServe Installation Guide. Storage Control Layer Module The Storage Control Layer (SCL) module manages shared SAN devices.
Chapter 5: Configure the SAN 65 As part of managing shared SAN devices, the SCL is also responsible for providing each disk with a globally unique device name that all servers in the matrix can use to access the device. Device Names The SCL uses unique device names to control access to shared SAN devices. These names form the pathnames that servers use to access shared data. When you import a SAN disk, the SCL gives it a global device name that represents the entire disk.
Chapter 5: Configure the SAN 66 You can use the mxmpconf utility to fix any problems with the membership partitions. Device Access Once imported, a shared device can be accessed only with its global device name, such as psd6p4. On each server, the SCL creates device node entries in the directory /dev/psd for every partition on the disk. The names of the entries match the global device names of the partitions.
Chapter 5: Configure the SAN 67 disk. The individual partitions are identified by the disk name followed by p and the partition number, such as psd25p4. To import disks using the Management Console, select Storage > Disk > Import or click the Import icon on the toolbar. The Import Disks window, which appears next, shows all SAN disks that are not currently imported into the matrix. The disk descriptions include the vendor, the disk’s UID, and its size.
Chapter 5: Configure the SAN 68 Deport SAN Disks Deporting a disk removes it from matrix control. The /dev/psd device nodes are removed and the original /dev entries are re-enabled. You cannot deport a disk that contains a mounted filesystem or a membership partition. Also, disks configured in a dynamic volume cannot be deported. (You will need to destroy the dynamic volume and then deport the disk.
Chapter 5: Configure the SAN 69 Change the Partitioning on a Disk The Linux fdisk utility can be used to change the partition layout on a SAN disk. If the disk is currently imported into the matrix, you must first deport the disk. When you use fdisk, the changes made to the partition table are visible only to the server where you made the changes. When you reimport the disk, the other servers in the matrix will see the updated partition table.
Chapter 5: Configure the SAN 70 When you select a disk, the window displays information about the partitions on the disk. Select a partition to display the corresponding Linux mount path for the PSFS filesystem. To import or deport a disk, select that disk and then click Import or Deport as appropriate. Storage Summary The Storage Summary window shows information about the PSFS filesystems configured on the matrix and also lists the LUNs that are currently unused and available for filesystems.
Chapter 5: Configure the SAN 71 • The mount point assigned to the filesystem. Click in the cell to see the mount point for each server on which the filesystem is configured. • The volume used for the filesystem. Click in the cell to see the properties for the filesystem. • The number of exports for the filesystem multiplied by the number of servers exporting the filesystem. Click in the cell to see the names of the servers and the export paths.
Chapter 5: Configure the SAN 72 -v Display available volumes. -f Display PSFS filesystem volumes. -a Display all information; for -v, display all known volumes. -l Additionally display host-local device name. -r Additionally display local device route information. -U Display output in the format used by the Management Console. This option is used internally by Matrix Server and does not produce human-readable output. -q Suppress output of all log messages.
Chapter 5: Configure the SAN 73 Show Filesystem Information The -f option displays existing PSFS filesystems on imported disks. # sandiskinfo -f Volume: /dev/psv/psv1 Size: 2439M (PSFS Filesystem) Stripesize=0K Local Mount Point=/mnt Volume: /dev/psd/psd1p6 Size: 490M (PSFS Filesystem) Disk=20:00:00:04:cf:13:38:18::0 partition=06 type=Linux (83) Local Mount Point=(not mounted) Show Available Volumes The -v option lists available volumes on imported disks.
Chapter 5: Configure the SAN Dynamic Volume: psv2 Dynamic Volume: psv3 74 Size: Size: 490M 490M Stripe=32K Stripe=8K Show Properties for Dynamic Volumes The --dynvol_properties [volname] option lists detailed properties for the specified dynamic volumes. volname is the psv name, such as psv2. If this option is omitted, the properties for all dynamic volumes are displayed.
Chapter 5: Configure the SAN 75 You can use the following command to control whether this failover behavior can occur on a particular node. Run the command on the server where you want to change the failover behavior. (Matrix Server starts with failover enabled.) # mxmpio enableall|disableall Enable or Disable Failover for a PSD Device When a failure occurs in the I/O path to a particular PSD device, Matrix Server by default fails over to another I/O path.
Chapter 5: Configure the SAN 76 With the exception of I (the array index), the value specified is converted to the corresponding host adapter/channel before being used to select the target. An Example of Changing the I/O Path In this example, we will change the target for a device. The mxmpio status -l command identifies the path currently being used by each device. That path is labeled “active.” The following output shows that device psd2p1 is active on target 1.
Chapter 5: Configure the SAN psd2p2 enabled 77 30000 0. (41:12) scsi2/0/1/20 (active) 1. (08:52) scsi1/0/1/20 Display Status Information The status command displays MPIO status information, including the timeout value, whether MPIO is enabled (globally and per-device), and any targets specified with the active command. Use the -l option to display more information about the targets, as in the above example.
Chapter 5: Configure the SAN 78 Set the Timeout Value The default timeout period for PSD devices is 30 seconds. If you need to modify this value for a particular PSD device, use the following command. value is in milliseconds; however, the smallest unit is 10 milliseconds. A value of zero disables timeouts. # mxmpio timeout value [PSD-device] Other MPIO Support Enable the MPIO Failover Feature on QLogic Drivers QLogic device drivers have an MPIO failover feature.
Chapter 5: Configure the SAN 79 • Options, enclosed in double quotes, to pass to insmod when it loads the driver. If no options are required, type a pair of double quotes (““) in the field. • A text description of the driver. Edit the fc_pcitable File To enable the failover feature, you will need to edit the fc_pcitable file. In the file, locate the line for your device driver. (For version 8.00.00 and later drivers, the option is in the qla2xxx module.
Chapter 5: Configure the SAN 80 scanning process in the hope that the third-party MPIO software will discover and manage the devices. If the process succeeds, Matrix Server will continue to start as normal. If the third-party software is unable to discover the target devices during this process, Matrix Server causes the node to reboot a second time. The goal of the second reboot is to allow the third-party MPIO software a better chance to discover devices before any other dependent software starts.
6 Configure Dynamic Volumes Matrix Server includes a Matrix Volume Manager that you can use to create, extend, recreate, or delete dynamic volumes. Dynamic volumes allow large filesystems to span multiple disks, LUNs, or storage arrays. Overview Basic and Dynamic Volumes Volumes are used to store PSFS filesystems. There are two types of volumes: dynamic and basic. Dynamic volumes are created by the Matrix Volume Manager.
Chapter 6: Configure Dynamic Volumes 82 Types of Dynamic Volumes Matrix Server supports two types of volumes: striped and concatenated. The volume type determines how data is written to the volume. • Striping. When a dynamic volume is created with striping enabled, a specific amount of data (called the stripe size) is written to each subdevice in turn. For example, a dynamic volume could include three subdevices and a stripe size of 64 KB.
Chapter 6: Configure Dynamic Volumes 83 • Striped dynamic volumes can be extended up to 16 times; however, the total number of subdevices cannot exceed 128. Guidelines for Creating Dynamic Volumes When creating striped dynamic volumes, follow these guidelines: • The subdevices used for a striped dynamic volume should be the same size. The Matrix Volume Manager uses the same amount of space on each subdevice in the stripeset.
Chapter 6: Configure Dynamic Volumes 84 Filesystem: If you want Matrix Server to create a filesystem that will be placed on the dynamic volume, enter a label to identify the filesystem. If you do not want a filesystem to be created, remove the checkmark from “Create filesystem after volume creation.” If you are creating a filesystem, you can also select the options to apply to the filesystem. Click the Options button to see the Filesystem Option dialog. The General tab allows you to select the block size.
Chapter 6: Configure Dynamic Volumes 85 The Quotas tab allows you to specify whether disk quotas should be enabled on this filesystem. If you enable quotas, you can set the default quota for users and groups on that filesystem. If you do not want a default limit, click “Unlimited.” To assign a limit, click “Limit” and then specify the appropriate size in either kilobyes, megabytes, gigabytes, or terabytes. The defaults are rounded down to the nearest filesystem block.
Chapter 6: Configure Dynamic Volumes 86 to all users and groups; however, you can change the quota for a specific user or group. See “Manage Filesystem Quotas” on page 175 for more information. Available Subdevices: The display includes all imported subdevices that are not currently in use by another volume and that do not have a filesystem in place. The subdevices that you select are used in the order in which they appear on the list.
Chapter 6: Configure Dynamic Volumes 87 Dynamic Volume Properties To see the configuration for a dynamic volume, select Storage > Dynamic Volume > Volume Properties and then choose the volume that you want to view. If a filesystem is associated with the volume, the Volume Properties window shows information for both the dynamic volume and the filesystem. [ The Stripe State reported in the “Dynamic Volume Properties” section will be one of the following: • Unstriped.
Chapter 6: Configure Dynamic Volumes 88 filled before writes to the next stripeset begin. To change the Stripe State to optimal, you will need to recreate the dynamic volume. To display the properties from the command line, use the following command: mx dynvolume properties Extend a Dynamic Volume The Extend Volume option allows you to add subdevices to an existing dynamic volume.
Chapter 6: Configure Dynamic Volumes 89 Dynamic Volume Properties: The current properties of this dynamic volume. Filesystem Properties: The properties for the filesystem located on this dynamic volume. Available Subdevices: Select the additional subdevices to be added to the dynamic volume. Use the arrow keys to reorder those subdevices if necessary. Extend Filesystem: To increase the size of the filesystem to match the size of the extended volume, click this checkbox.
Chapter 6: Configure Dynamic Volumes 90 Delete a Dynamic Volume When a dynamic volume is deleted, the filesystem on that volume, and any persistent mounts for the filesystem, are also deleted. Before deleting a dynamic volume, be sure that the filesystem is no longer needed or has been copied or backed up to another location. The filesystem must be unmounted when you perform this operation. To delete a dynamic volume from the Management Console, select Storage > Dynamic Volume > Delete Volume.
Chapter 6: Configure Dynamic Volumes 91 another location. The filesystem must be unmounted when you recreate the volume. To recreate a dynamic volume on the Management Console, select Storage > Dynamic Volume > Recreate Volume and then choose the volume that you want to recreate. If a filesystem is mounted on the volume, the Recreate Dynamic Volume window shows information for both the dynamic volume and the filesystem.
Chapter 6: Configure Dynamic Volumes 92 Convert a Basic Volume to a Dynamic Volume If you have PSFS filesystems that were created directly on an imported disk partition or LUN (a basic volume), you can convert the basic volume to a dynamic volume. The new dynamic volume will contain only the original subdevice; you can use the Extend Volume option to add other subdevices to the dynamic volume. NOTE: The new dynamic volume is unstriped. It is not possible to add striping to a converted dynamic volume.
7 Configure PSFS Filesystems PolyServe Matrix Server provides the PSFS filesystem. This direct-access shared filesystem enables multiple servers to concurrently read and write data stored on shared SAN storage devices. A journaling filesystem, PSFS provides live crash recovery.
Chapter 7: Configure PSFS Filesystems 94 to take the appropriate actions. For example, if you want to spread a kernel build across four servers, you will need to run a cooperative make. Journaling Filesystem When you initiate certain filesystem operations such as creating, opening, or moving a file or modifying its size, the filesystem writes the metadata, or structural information, for that event to a transaction journal. The filesystem then performs the operation.
Chapter 7: Configure PSFS Filesystems 95 Filesystem Management and Integrity Matrix Server uses the SANPulse daemon to manage PSFS filesystems. SANPulse performs the following tasks: • Coordinates filesystem mounts, unmounts, and crash recovery operations. • Checks for matrix partitioning, which can occur when matrix network communications are lost but the affected servers can still access the SAN.
Chapter 7: Configure PSFS Filesystems 96 Crash Recovery When a server using a PSFS filesystem either crashes or stops communicating with the matrix, another server in the matrix will replay the filesystem journal to complete any transactions that were in progress at the time of the crash. Users on the remaining servers will notice a slight delay while the journal is replayed. Typically the recovery procedure takes only a few seconds. The recovery process restores only the structural metadata information.
Chapter 7: Configure PSFS Filesystems 97 Filesystem Restrictions The following restrictions apply to the PSFS filesystem: • A PSFS filesystem cannot be used as the root or /boot filesystem. • A server can mount another non-shared filesystem on a directory of a PSFS filesystem; however, that filesystem will be local to the host. It will not be mounted on other hosts in the matrix. • PSFS filesystems cannot be mounted using the Linux loop device.
Chapter 7: Configure PSFS Filesystems 98 The mount option also turns off the filesystem’s file data coherency control, allowing the database to manage the coherency of its own data files. Matrix Server Database Option is available as a separately licensed product. Create a Filesystem A PSFS filesystem can be created on a basic volume (a psd device) or a dynamic volume (a psv device). On a 32-bit operating system, PSFS filesystems must use 4KB as the block size.
Chapter 7: Configure PSFS Filesystems 99 Create a Filesystem from the Management Console To create a filesystem, select Matrix > Add > Add Filesystem or click the Filesystem icon on the toolbar. Label: Type a label that identifies the filesystem. Available Volumes: This part of the window lists the basic or dynamic volumes that are currently unused. Select one of these volumes for the filesystem.
Chapter 7: Configure PSFS Filesystems filesystem must be 4K. Be sure to select the appropriate block size for your filesystem (see “Create a Filesystem” on page 98). The Quotas tab allows you to specify whether disk quotas should be enabled on this filesystem. Copyright © 1999-2007 PolyServe, Inc. All rights reserved.
Chapter 7: Configure PSFS Filesystems 101 If you enable quotas, you can set the default quota for users and groups on that filesystem. If you do not want a default limit, click “Unlimited.” To assign a limit, click “Limit” and then specify the appropriate size in either kilobyes, megabytes, gigabytes, or terabytes. The defaults are rounded down to the nearest filesystem block.
Chapter 7: Configure PSFS Filesystems 102 • --blocksize <4K|8K|16K|32K> The block size used by the filesystem. For a 32-bit operating system, the block size must be 4K. For a 64-bit operating system, the supported sizes are 4K, 8K, 16K, or 32K. The maximum filesystem size is dependent on the block size used for the filesystem. The block size cannot be changed after the filesystem is created.
Chapter 7: Configure PSFS Filesystems 103 The command has this syntax: /opt/polyserve/sbin/mkpsfs [-q] [-n ] [-l
Chapter 7: Configure PSFS Filesystems 104 The -o option has the following parameters: • blocksize=# The block size for the filesystem. For a 32-bit operating system, the valid block size is 4096, or 4K. For a 64-bit operating system, the valid block sizes are 4096, 8192, 16384, and 32768. The block sizes can also be specified as 4K, 8K, 16K, and 32K. • disable-fzbm Create the filesystem without Full Zone Bit Maps (FZBM).
Chapter 7: Configure PSFS Filesystems 105 Mount a Filesystem You can mount a PSFS filesystem on any server that can access the storage device over the SAN. The directory mountpoint for the filesystem must exist before the filesystem is mounted. The filesystem must be mounted either read-write or read-only across the matrix. It cannot be mounted read-only on one server and read-write on another server. To change the way the filesystem is mounted, first unmount it on all servers, then remount it.
Chapter 7: Configure PSFS Filesystems 106 On Servers: Select the servers where the filesystem is to be mounted. Shared Mount Options This option applies to all servers on which the filesystem is mounted. Read/Write or Read Only. Mount the filesystem read-write or read-only. Read/Write is the default. Server Mount Options These mount options can be different on each server. Mount point: Type the directory mount point for the filesystem. Activate: To mount the filesystem now, click Activate.
Chapter 7: Configure PSFS Filesystems 107 Persist: This option causes the filesystem to be remounted automatically when the server is rebooted and is enabled by default. If you do not want the filesystem to be remounted automatically, remove the checkmark. Create Directory: If you want Matrix Server to create the directory mountpoint on each server where the filesystem is to be mounted, click Create Directory. Select any other mount options for the filesystem. Async or Sync.
Chapter 7: Configure PSFS Filesystems 108 To take advantage of the DB Optimized performance optimization, an application’s read or write buffer argument to the read or write system call must be page-aligned and must be at least a multiple of 512 bytes in length. Additionally, the target file address (the offset from the beginning of the file where the I/O will start) must also be a multiple of 512 bytes. If a transfer does not meet these three requirements, it will be slower.
Chapter 7: Configure PSFS Filesystems 109 Note for Oracle Real Application Clusters Users When placing Oracle Clusterware files on DB Optimized mounted filesystems, the files must be pre-initialized with the dd(1) command. This is a requirement with any CFS when using Oracle Clusterware. To presize the Oracle Clusterware files, perform the following action: $ dd if=/dev/zero of=/ocr.dbf bs=1024k count=200 $ dd if=/dev/zero of=/css.
Chapter 7: Configure PSFS Filesystems 110 NOTE: This mount option should be used with extreme caution. When cluster-coherent record lock operations are disabled, the record locks established against a file in the filesystem by each node in the cluster are invisible to all other nodes in the same cluster.
Chapter 7: Configure PSFS Filesystems 111 • EXEC/ NOEXEC. Permit (or do not permit) the execution of binaries on the mounted filesystem. EXEC is the default. NOEXEC can be used on a system that has filesystems containing binaries for other architectures. • SUID/ NOSUID. Allow (or do not allow) set-user-id bits and set-groupid bits to take effect. SUID is the default. • SHARED/ EXCLUSIVE. Either allow all servers having physical access to the filesystem to mount it or allow only one server.
Chapter 7: Configure PSFS Filesystems 112 • For a psv device, the device is specified as /dev/psv/psvXXX, where XXX is the volume number. For example, /dev/psv/psv1. For example, the following command mounts the filesystem on the partition /dev/psd/psd12p6 in read-write mode at the mountpoint /data1.
Chapter 7: Configure PSFS Filesystems 113 Unmount from the Command Line To unmount a filesystem from the command line, use one of the following commands. PSFS filesystems cannot be forcibly unmounted. The Matrix Server mx command. The --persistent argument removes the persistent status from the filesystem mount; the --active argument unmounts the filesystem now. mx fs unmount [--persistent] [--active] ALL_SERVERS| ... The Linux umount command.
Chapter 7: Configure PSFS Filesystems 114 The Edit Persistent Mounts window lists all filesystems having a persistent mount on this server. • To remove the “persistent” mount status for one or more filesystems, select those filesystems and then click Delete. • To mount a filesystem with the options specified for the persistent mount, select that filesystem and then click Activate.
Chapter 7: Configure PSFS Filesystems 115 The Edit Persistent Mounts window lists all servers that are configured to have a persistent mount for the filesystem. • To remove the “persistent” mount status on a particular server, select that server and then click Delete. • To mount the filesystem with the options specified for the persistent mount, select the appropriate servers and click Activate.
Chapter 7: Configure PSFS Filesystems 116 Volume Tab On the Properties window, the Volume tab provides information about the storage device and allows you to extend the filesystem. If the filesystem is on a dynamic volume, you can click the Volume Properties button to see more information about the volume, including the subdevices used for the dynamic volume and the stripe size and state.
Chapter 7: Configure PSFS Filesystems 117 When you click Yes, Matrix Server will extend the filesystem to use all of the available space. If the filesystem is on a dynamic volume, you can use the Extend Volume option to increase the size of both the dynamic volume and the filesystem. Features Tab The Features tab shows whether Full Zone Bit Maps (FZBM) or quotas are enabled on the filesystem. Copyright © 1999-2007 PolyServe, Inc. All rights reserved.
Chapter 7: Configure PSFS Filesystems 118 Quotas Tab The Quotas tab allows you to enable or disable quotas on the filesystem, to set the default hard limit for users and groups, and to view or modify the quotas for individual users and groups. If you enable quotas, you can set the default quota for users and groups on that filesystem. If you do not want a default limit, click “Unlimited.
Chapter 7: Configure PSFS Filesystems 119 View Filesystem Status The Filesystems tab on the Management Console lists information about all filesystems in the matrix. For each filesystem, the display lists the servers on which the filesystem is configured, including the mountpoints. The following example shows that filesystem u02 is mounted at /u02 on three servers and is not currently mounted on one server.
Chapter 7: Configure PSFS Filesystems 120 View Filesystem Errors for a Server The Management Console can display information about the last error or event that occurred on a particular filesystem mount. To see the information, select the filesystem and then select the server. Right-click, and select View Last Error. Check a Filesystem for Errors If a filesystem is not unmounted cleanly, the journal will be replayed the next time the filesystem is mounted to restore consistency.
Chapter 7: Configure PSFS Filesystems 121 For more information about the check, click the Details button. If psfsck locates errors that need to be repaired, it will display a message telling you to run the utility from the command line. For more information, see the PolyServe Command Reference or the psfsck(8) man page. CAUTION: We strongly recommend that you make a backup copy of the entire partition before you attempt to run psfsck with the --rebuild-tree option.
Chapter 7: Configure PSFS Filesystems 122 Because atime updates can have a performance impact, and to maintain the same behavior as in previous versions of the PSFS filesystem, the atime update feature is disabled by default. If needed, the feature can be enabled on specific nodes. A filesystem mounted on those nodes will perform atime updates unless the feature is disabled specifically for that filesystem at mount time. NOTE: Access times are never updated on read-only filesystems.
Chapter 7: Configure PSFS Filesystems 123 Suspend a Filesystem for Backups The psfssuspend utility suspends a PSFS filesystem in a stable, coherent, and unchanging state. While the filesystem is in this state, you can copy it for backup and/or archival purposes. When copying directly from a suspended device, be sure to use the raw device (/dev/rpsd/...) to ensure that all of the copied blocks are up-to-date.
Chapter 7: Configure PSFS Filesystems 124 NOTE: If an attempt to mount the copied filesystem fails with an “FSID conflict” error, run the following command as user root. In the command, is the psd or psv device, such as /dev/psd/psd1p7 or /dev/psv/psv1, containing the copied filesystem, and
Chapter 7: Configure PSFS Filesystems 125 You can use the -s option to specify the new size for the filesystem. If you do not specify the size, the filesystem will grow to the size of the partition. The -s option can be used as follows.
Chapter 7: Configure PSFS Filesystems 126 on the Edit Persistent Mounts window, select the nodes where the filesystem should be mounted and click Activate. • For filesystems without a persistent mount, follow the normal procedure to mount the filesystem as described under “Mount a Filesystem” on page 105. Context Dependent Symbolic Links A Context Dependent Symbolic Link (CDSL) contains a keyword that identifies a particular location.
Chapter 7: Configure PSFS Filesystems 127 Examples Locate a Target by Its Hostname This example uses three servers: serv1, serv2, and serv3. Each server must have specific configuration files in the /oracle/etc directory. You can use a CDSL to simplify accessing these server-specific files. 1. Create a subdirectory for each server in /oracle, which is a PSFS filesystem. Then create an /etc subdirectory in each server directory.
Chapter 7: Configure PSFS Filesystems 128 1. Create a subdirectory in /oracle for each machine type and then create a bin and sbin directory in the new machine-type directories. You now have the following directories in the /oracle PSFS filesystem: /oracle//bin /oracle//sbin /oracle//bin /oracle//sbin 2. Copy the appropriate binaries to the new bin and sbin directories. 3.
Chapter 7: Configure PSFS Filesystems 129 When you are logged in on serv1, the /oracle/etc symbolic link will point to /etc/serv1.xvz.com. On serv2, it will point to /etc/serv2.xvz.com. Matrix-Wide File Locking Matrix Server supports matrix-wide locks on files located on PSFS filesystems. The locks are implemented with the standard Linux flock() system call, which is also known as the BSD flock interface.
Chapter 7: Configure PSFS Filesystems 130 Unlock a Semaphore To unlock a PSFS command-line semaphore, use this command: $ psfssema -r The command unlocks the PSFS command-line semaphore associated with , which is a semaphore file created by psfssema -i. If other nodes are blocked on the semaphore when psfssema-r is called, one of the blocked psfssema -g processes will return successfully.
8 Configure MxFS-Linux Matrix Server and MxFS-Linux provide scalability and high availability for the Network File System (NFS), which is commonly used on UNIX and Linux systems to share files remotely. Overview MxFS-Linux Concepts and Definitions MxFS-Linux uses and manages the following objects to provide scalable and highly available file service across the cluster: Export Groups, export records, and Virtual NFS Services.
Chapter 8: Configure MxFS-Linux 132 exported PSFS filesystems are mounted. These monitors can initiate failover actions when the monitor probe reports a failure. Export Records An export record is equivalent, indeed exactly equal to, the individual records contained in the /etc/exports file of a traditional NFS server. The format of the record and the options available are precisely the same in form and function.
Chapter 8: Configure MxFS-Linux 133 one of the networks already in use in the cluster. The administrator then specifies both a primary node to host the Virtual NFS Service and an ordered set of backup nodes to host the Virtual NFS Service in case of failover. Finally, the administrator selects one Export Group to be associated with the Virtual NFS Service. This Export Group defines which exports will be available via NFS for this Virtual NFS Service.
Chapter 8: Configure MxFS-Linux 134 monitoring process as the Virtual NFS Services come and go from the cluster node. Program number 300277 has been assigned officially by the RPC numbering registrar. Configure Export Groups An Export Group describes a set of PSFS filesystems to be exported. It also specifies the Virtual NFS Services that will provide virtual IP addresses that clients use to access those filesystems.
Chapter 8: Configure MxFS-Linux 135 One or Many Export Groups? In the typical configuration, a single Export Group associated with multiple Virtual NFS Services is recommended. High Availability and Failover Support MxFS-Linux provides high availability for the exported PSFS filesystems. To ensure that the filesystems are always available, MxFS-Linux provides a global NFS monitor for the system as well as a high-availability monitor for each Export Group.
Chapter 8: Configure MxFS-Linux 136 the node. However, such access is not highly available because there is no Virtual NFS Service associated with those connections. This is not a security concern because the entire cluster should be considered as a single security domain (for this and other reasons). The administrator can track which nodes are exporting highly available NFS services by watching the Export Group high-availability monitors that are active on those nodes.
Chapter 8: Configure MxFS-Linux 137 The “async” option provides better NFS write performance, as it allows writes to be acknowledged before being committed on disk; however, it gives less coherency with regard to on-disk contents in the cluster. Consequently, server crashes can lead to silent-data-loss; the NFS client may never be aware of a given server crashing because MxFS-Linux provides for seamless fail-over. Add an Export Group To create an Export Group, select Matrix > Add > Add Export Group.
Chapter 8: Configure MxFS-Linux 138 assign the Export Group to one or more Virtual NFS Services later in this procedure, the Virtual NFS Services will be added to this application. The Add Export Groups window has three tabs. The NFS Exports and Virtual NFS Services tabs are used to create the Export Group. The third tab, Status, reports the current state of the Export Group. You will need to complete the NFS Exports and Virtual NFS Services tabs.
Chapter 8: Configure MxFS-Linux 139 An export record uses the following format, which is documented in the Linux exports(5) man page: [] [(
Chapter 8: Configure MxFS-Linux 140 Exported Path. Specify the filesystem or directory to be exported. Client Names: Click Add to specify a client to be allowed to mount the exported filesystem. Then, on the Add NFS Client dialog, identify the client by either its Netmask, IP address, or FQDN. Optionally, you can enter an asterisk (*) to specify all clients. Copyright © 1999-2007 PolyServe, Inc. All rights reserved.
Chapter 8: Configure MxFS-Linux 141 Options. Select any options that apply to this record. The default options are selected initially and do not appear on the Preview line. After you have created the export records, you can use the arrow buttons on the NFS Exports tab to reorder the entries as you prefer them. NOTE: When the Export Group is created, MxFS-Linux validates each export record and reports any syntax errors.
Chapter 8: Configure MxFS-Linux 142 If the appropriate Virtual NFS Service does not currently exist, you can create it by clicking the “Create a single Virtual NFS Service” button. (See “Add a Virtual NFS Service” on page 159 for more information.) You can also create multiple Virtual NFS Services in one operation by clicking the “Create a spanning set of Virtual NFS Services” button.
Chapter 8: Configure MxFS-Linux 143 Create a Spanning Set of Virtual NFS Services The “Create a spanning set of Virtual NFS Services” button on the Virtual NFS Services tab enables you to create multiple Virtual NFS Services with a single operation and assign them to the Export Group that you are creating. When you click this button, the dialog “Create a spanning set of Virtual NFS Services” appears and you can configure the Virtual NFS Services.
Chapter 8: Configure MxFS-Linux 144 You can enter the DNS name or IP address for each Virtual NFS Service directly on the dialog, or you can specify a range of virtual IP addresses to be used for the Virtual NFS Services. To specify a range, click the button “Automatically create range of VNFS addresses.” Then specify the range on the dialog that appears next. When you click OK, the IP addresses will be filled in on the “Create a Spanning Set of Virtual NFS Services” dialog.
Chapter 8: Configure MxFS-Linux 145 Virtual NFS Service will be up than on the node currently hosting the Virtual NFS Service.) When you click OK on the “Create a Spanning Set of Virtual NFS Services” dialog, you will return to the Add Export Group dialog. Then click Apply or OK to create the Export Group, or configure the advanced options for the high-availability monitor as described under “Advanced Options for Export Group Monitors” on page 146.
Chapter 8: Configure MxFS-Linux 146 The monitor also attempts to perform a direct IO read of a few blocks on the filesystem. If these additional IO reads cause problems on your system, you can disable them via the global NFS probe settings. The Export Group monitor, which is labeled “Export Group ,” looks like this on the Servers tab of the Management Console. The Export Group monitor also appears on the Virtual Hosts tab.
Chapter 8: Configure MxFS-Linux 147 Frequency: The interval of time, in seconds, at which the monitor checks that each exported path in the Export Group is available and is mounted on the PSFS filesystem. The default is 30 seconds. To set the frequency from the command line, use this option: --frequency Probe Severity The Probe Severity tab lets you specify the failover behavior of the Export Group monitor. Copyright © 1999-2007 PolyServe, Inc. All rights reserved.
Chapter 8: Configure MxFS-Linux 148 The Probe Severity setting works with the Virtual NFS Service policy (either AUTOFAILBACK or NOFAILBACK) to determine what happens when a monitor probe fails. The default policies (AUTOFAILBACK for the Virtual NFS Service and AUTORECOVER for the Export Group monitor) cause Matrix Server to fail over the associated Virtual NFS Services to a backup network interface on another node when the monitor probe fails.
Chapter 8: Configure MxFS-Linux 149 AUTORECOVER. This is the default. The Virtual NFS Services fail over when a monitor probe fails. When the original node is again available, failback occurs according to the failback policy for the Virtual NFS Services. NOAUTORECOVER. The Virtual NFS Services fail over when a monitor probe fails and the monitor is disabled on the original node, preventing automatic failback.
Chapter 8: Configure MxFS-Linux 150 Monitors can optionally be configured with scripts that are run at various points during matrix operation. The script types are as follows: Recovery script. Runs after a monitor probe failure is detected, in an attempt to restore the NFS service. Start script. Runs as the NFS service is becoming active on a server. Stop script. Runs as the NFS service is becoming inactive on a server.
Chapter 8: Configure MxFS-Linux 151 This behavior is necessary because the Start and Stop scripts are run to establish the desired start/stop activity, even though the service may actually have been started by something other than MxFS-Linux. The Start and Stop scripts must also handle recovery from events that may cause them to run unsuccessfully. For example, if the system encounters a problem, the script will fail and exit non-zero.
Chapter 8: Configure MxFS-Linux 152 To configure event severity from the command line, use this option: --scriptSeverity consider|ignore Script Ordering Script ordering determines the order in which the Start and Stop scripts are run when a Virtual NFS Service moves from one node to another. If you do not configure Start and Stop scripts for an Export Group, the script ordering configuration has no effect. There are two settings for script ordering. SERIAL. This is the default setting.
Chapter 8: Configure MxFS-Linux 153 View Export Group Properties To view all of the Export Groups configured with MxFS-Linux, select View > View All Export Groups, or select an Export Group on the Virtual Hosts tab or a Virtual NFS Service on the Applications tab, right-click, and select View All Export Groups. The All Export Groups window then appears. This window lists the names of all Export Groups that have been configured.
Chapter 8: Configure MxFS-Linux 154 To view status from the command line, use the following command: mx exportgroup status [--up|--down] [--enabled|--disabled] [--primary|--backup] [--active|--inactive] [ ...] Modify an Export Group To modify an existing Export Group, go to the Export Group Properties window, as described above. To change an export record, go to the NFS Exports tab.
Chapter 8: Configure MxFS-Linux 155 Other Export Group Procedures Disable an Export Group Select the high-availability monitor associated with the Export Group on the Servers or Application tab, right-click, and select Disable. To disable the high-availability monitor from the command line, use this command. If no servers are specified, the action takes place on all servers. mx exportgroup disable ([ALL_SERVERS]|[ ...
Chapter 8: Configure MxFS-Linux 156 Configure the Global NFS Probe Settings The global NFS probe periodically checks the following on each node: • The health of the NFS Server. The probe does this by issuing a NULL RPC call to the NFS Server on the local node. • The general health of the MxFS-Linux high-availability service. The probe does this by checking for critical MxFS-Linux processes. (If the node is initializing or shutting down, the processes may not be running.
Chapter 8: Configure MxFS-Linux 157 To configure the probe settings from the command line, use this command: mx nfsprobe configure [--frequency ] [--timeout ] [--directRead enable|disable] To view the configuration, use this command: mx nfsprobe showconfig [--noHeaders] [--csv] The --noHeaders option does not include the column headings in the output. The --csv option prints the report in a comma-separated format.
Chapter 8: Configure MxFS-Linux 158 This type of configuration provides the best possible utilization of resources in the matrix, while also preserving high availability. Active-Passive Failover Configuration In an active-passive configuration, one or more nodes act as a backup for all of the Virtual NFS Services running on the other nodes. The passive nodes are running NFS, but are not the primary for any Virtual NFS Services.
Chapter 8: Configure MxFS-Linux 159 • Virtual NFS Services share a common name-space with other Matrix Server virtual hosts. You cannot create a virtual host and a Virtual NFS Service having the same network address. • After creating Virtual NFS Services, you will need to configure your applications to recognize them. Add a Virtual NFS Service To add a virtual NFS Service, select Matrix > Add > Add Virtual NFS Service. Virtual NFS Service: Enter a hostname or IP address for this Virtual NFS Service.
Chapter 8: Configure MxFS-Linux 160 Always active: If you check this box, upon server failure, the Virtual NFS Service will move to an active server even if the associated Export Group monitor is inactive or down. If the box is not checked, failover will not occur when the associated monitor is inactive or down on all of the backup servers and the Virtual NFS Service will not be made active anywhere. (See “Virtual Host Activeness Policy” on page 215 for details.
Chapter 8: Configure MxFS-Linux 161 To create a Virtual NFS Service from the command line, use this command: mx vnfs add [--exportgroup |NONE] [--policy autofailback|nofailback] [--activitytype single|always] ... You can view the configuration and status of Virtual NFS Services and the Export Groups associated with them on the Management Console. See “High-Availability Monitors” on page 145 and also “Application States” on page 195.
Chapter 8: Configure MxFS-Linux 162 From the Application Tab Select the Virtual NFS Service on the Applications tab, right-click, and then select Rehost. The Virtual NFS Service Rehost window shown above then appears. From the Command Line Issue the following command, where is the Virtual NFS Service to be rehosted. You will need to specify all network interfaces on which the Virtual NFS Service should be configured (the primary and all backups). mx vnfs move ...
Chapter 8: Configure MxFS-Linux 163 Other Virtual NFS Service Procedures Update a Virtual NFS Service Select the Virtual NFS Service on Management Console, right-click, and select Properties. To update the Virtual NFS Service from the command line, use this command: mx vnfs update [--exportgroup |NONE] [--policy autofailback|nofailback] [--activitytype single|always] ...
Chapter 8: Configure MxFS-Linux 164 NFS Clients After MxFS-Linux is configured, your NFS clients can begin accessing the exported PSFS filesystems. Timeout Configuration It is recommended that NFS clients have a minimum timeout value of 120 seconds. NFS failovers typically take much less time, but in a worst-case scenario may approach 120 seconds. Client Mounts To access the shared data on PSFS filesystems, clients simply mount the exported PSFS filesystems. # mkdir /mnt/data1 # mount -t nfs 99.10.210.
Chapter 8: Configure MxFS-Linux 165 Following are some of the important parameters: Hard Mount Parameters: intr Specify intr if users are not likely to damage critical data by manually interrupting an NFS request. If a hard mount is interruptible, a user can press Ctrl-C or issue the kill command to interrupt an NFS mount that is hanging indefinitely because a server is down.
Chapter 8: Configure MxFS-Linux 166 If you are receiving I/O errors with a soft mount, you may want to consider either switching to a hard mount or raising your timeio and/or retrans parameters to compensate. Consider that the maximum acceptable time delay for an nfs mount to respond before receiving an I/O error is (retrans*timeo). In the above example, this is 4*0.7=2.8 seconds.
Chapter 8: Configure MxFS-Linux 167 Using the NLM Protocol NLM is the locking protocol used by NFS. By default, it is disabled when MxFS-Linux is installed. If necessary, NLM can be enabled; however, you should be aware of the following caveat: • File locks granted by the NFS server are cluster-coherent. When a failover occurs, the locks are released by the original server and the client automatically reclaims them on the new server (the backup node).
Chapter 8: Configure MxFS-Linux 168 NFS Tuning The default operating system and network settings do not provide optimal NFS performance. Improvements can be made by modifying certain operating system parameters and making adjustments to the network components. NOTE: When tuning NFS, it is important to consider your workload and environment. It is up to you to determine the best settings for optimizing NFS performance.
Chapter 8: Configure MxFS-Linux 169 Kernel In the kernel, certain tunable parameters affect allocations for buffers and the TCP core. TCP window scaling, which is enabled by default, allows the receiving system to scale the local TCP receive window larger than the default 64 KB. This occurs only if the system settings for the default and maximum TCP socket buffer sizes are set to allow larger than “normal” buffers.
Chapter 8: Configure MxFS-Linux 170 interface. (Check the documentation for your NIC to determine whether changing the value is supported.) The adjustment may allow the network stack to continue queuing data to the interface after the internal hardware buffers are completely used. Be conservative when increasing the value. The recommended value is 5000. NFS Clients See “NFS Clients” on page 164 for information about mount options that can affect client operations.
Chapter 8: Configure MxFS-Linux 171 Tuning Point OS Default Recommended net.ipv4.tcp_sack Enable/disable TCP selective ACKs. Because of a bug in the Linux kernel, we recommend that this feature be disabled. 1 0 net.core.rmem_default Default global socket buffer size for read sockets, all protocols. 135168 (132 KB) 524287 net.core.rmem_max Default global maximum socket buffer size for receive sockets, all protocols. 131071 (128 KB) 16777216 net.core.
Chapter 8: Configure MxFS-Linux 172 Tuning Point OS Default Recommended NFS server threads When MxFS-Linux is installed, it sets the number of threads to 32 in the /etc/sysconfig/nfs file. (On Red Hat, the value is set via the RPCNFSDCOUNT parameter. On SLES9, the parameter is USE_KERNEL_NFSD_NUMBER.) 4 32 Adjust the Tuning Parameters To apply the parameter changes listed above, add the following entries to the /etc/sysctl.conf file, which is read at system startup.
Chapter 8: Configure MxFS-Linux 173 Name in /etc/sysctl.conf Location in /proc net.core.rmem_max /proc/sys/net/core/rmem_max net.core.wmem_max /proc/sys/net/core/wmem_max net.ipv4.tcp_rmem /proc/sys/net/ipv4/tcp_rmem net.ipv4.tcp_wmem /proc/sys/net/ipv4/tcp_wmem net.ipv4.tcp_timestamps /proc/sys/net/ipv4/tcp_timestamps net.ipv4.tcp_sack /proc/sys/net/ipv4/tcp_sack net.core.rmem_default /proc/sys/net/core/rmem_default net.core.wmem_default /proc/sys/net/core/wmem_default net.core.
Chapter 8: Configure MxFS-Linux 174 Adjust the NIC Parameters To set the NIC parameters, you will need to add commands to the appropriate file for your operating system, as described below. The files are executed after the system is booted, specifically after all run-level scripts are completed. • On RHEL4 systems, add the commands to the /etc/rc.local file. This file is typically created when the OS is installed. • On SLES9 systems, add the commands to the /etc/after.local file.
9 Manage Filesystem Quotas The PSFS filesystem supports disk quotas, including both hard and soft limits. After quotas are enabled on a filesystem, you can use the Quotas editor provided with the Management Console to view or set quotas for specific users and groups. Hard and Soft Filesystem Limits The PSFS filesystem supports both hard and soft filesystem quotas for users and groups.
Chapter 9: Manage Filesystem Quotas 176 When you create a PSFS filesystem, you will need to specify whether quotas should be enabled or disabled on that filesystem. (See “Create a Filesystem” on page 98.) Quotas can also be enabled or disabled on an existing filesystem. The filesystem must be unmounted. Locate the filesystem on the Management Console, right-click, and select Properties. Then go to the Quotas tab on the Properties dialog. Check or uncheck “Enable quotas” as appropriate.
Chapter 9: Manage Filesystem Quotas 177 default limit, click “Unlimited.”) The default quotas apply to all users and groups; however, you can change the quota for a specific user or group. To enable or disable quotas from the command line, use mx quota command or the psfsquota or psfsck utilities. The psfsquota and psfsck utilities provide the following options to enable or disable quotas on a PSFS filesystem: • --enable-quotas Build the necessary quota infrastructure on the specified filesystem.
Chapter 9: Manage Filesystem Quotas 178 The following example enables quotas on volume psv1 and sets the default user and group quotas to 20 gigabytes. /opt/polyserve/sbin/psfsquota --enable-quotas --set-udq 20G --set-gdq 20G psv1 For more information about these utilities and the mx quota command, see the PolyServe Command Reference.
Chapter 9: Manage Filesystem Quotas 179 The default display includes columns for the name and common name of the user or group, the hard limit, the disk space currently used, and the percent of the hard limit that has been reached. NOTE: When an asterisk (*) appears in the “Hard Limit” column of the quota report, it means that the hard limit is set to the default for the filesystem. You can add columns to the display for the user or group ID and the soft limit.
Chapter 9: Manage Filesystem Quotas 180 Quota Searches You can use the search feature on the left side of the quota editor to locate quota information for specific users or groups. If you are searching by name, the quota information must be in a database (such as a password file or LDAP database) that can be accessed from the server where the filesystem is mounted. The search locates the name in the database and matches it with the ID, which is the value stored on the filesystem.
Chapter 9: Manage Filesystem Quotas 181 The basic search procedure is as follows: • Enter a search pattern if desired. If the pattern is a regular expression, click on “Regular Expression.” The search rules are as follows: – Searches for names can include the following wildcards: * ? [ ] – A search pattern consisting of a single asterisk matches all IDs. – Use commas to separate individual names or IDs (such as jperry,twhite,rjohnson or 1500,1503,1510).
Chapter 9: Manage Filesystem Quotas 182 View or Change Limits for a User or Group To see the limits assigned to a particular user or group, highlight that user (or group) on the Quotas dialog and then click the Properties icon on the toolbar. You can set the hard and soft quota limits as necessary. Add Quotas for a User or Group To assign quotas to a user or group, click the Add button on the Quota editor toolbar and then search for the user or group in the database.
Chapter 9: Manage Filesystem Quotas 183 If you know the user or group ID and want to skip the search (or if the LDAP or password file is missing), click Advanced and enter the ID on the Advanced User/Group Add dialog. You can specify ranges of IDs or a list of IDs separated by commas. Then select the type of search (User or Group) and click Add. Copyright © 1999-2007 PolyServe, Inc. All rights reserved.
Chapter 9: Manage Filesystem Quotas 184 NOTE: If the user or group has been assigned quotas on another filesystem, you can highlight the entry for that user or group on that filesystem and then select Edit > Insert to open the Add Quota dialog. When the Add Quota dialog appears, select the appropriate filesystem and set the quota limits. Any existing quota limits on the filesystem will be overwritten. When you click OK, the quota limits will be assigned to the user or group.
Chapter 9: Manage Filesystem Quotas 185 Remove Quotas for a User or Group If a particular user or group no longer owns files on a filesystem, you can remove the quotas for that user or group. Select the user (or group) on the Quotas dialog and then click the Delete icon on the toolbar. NOTE: The quotas cannot be removed if the user or group has blocks allocated on the filesystem.
Chapter 9: Manage Filesystem Quotas 186 NOTE: You do not need to install this RPM to use quotas with PSFS filesystems. With the exception of warnquota, all of the functionality of these commands is provided by the Quotas window on the Management Console and by the mx quota commands. Be sure to invoke the Matrix Server versions of these commands instead of the versions provided with the Linux distribution.
Chapter 9: Manage Filesystem Quotas 187 The -f option specifies the file that psfsrq should read to obtain the quota data. If this option is not specified, psfsrq reads from stdin. filesystem is the psd or psv device used for the filesystem. Examples The following command saves the quota information for the filesystem located on device psd1p5. # psfsdq -f psd1p5.quotadata psd1p5 The next command restores the data to the filesystem: # psfsrq -f psd1p5.
10 Manage Hardware Snapshots Matrix Server provides support for taking hardware snapshots of PSFS filesystems. The subdevices on which the filesystems are located must reside on one or more storage arrays that are supported for snapshots. Snapshot support can be configured either on the Management Console “Configure Matrix” window or via the mxconfig utility. (See the PolyServe Installation Guide for more information.) This procedure creates a snapshot configuration file on each server.
Chapter 10: Manage Hardware Snapshots 189 HP EVA Storage Arrays To take hardware snapshots on Hewlett-Packard StorageWorks Enterprise Virtual Array (EVA) storage arrays, the latest version of the HP StorageWorks Scripting System Utility (SSSU) must be installed on all servers in the cluster. Also, the latest version of CommandView EVA software must be installed on your Management Appliance. Be sure that your versions of SSSU and CommandView EVA are consistent.
Chapter 10: Manage Hardware Snapshots 190 The dialog describes the information that you will need to enter. When you complete the information and click OK, Matrix Server takes these steps: • Quiesces the filesystem to ensure that the snapshot can be mounted cleanly. • Performs the snapshot operation using the snapshot capability provided by the array. Matrix Server uses the next available LUNs for the snapshot. • Resumes normal filesystem activity. • Imports the LUNs used for the snapshot into the matrix.
Chapter 10: Manage Hardware Snapshots 191 To create a snapshot from the command line, first run the following command to determine the options available for the array type on which the specified volume is located: mx snapshot showcreateopt Then run the following command to create the snapshot: mx snapshot create [--terse] [] The --terse option causes only the name of the snapshot volume to be printed on success.
Chapter 10: Manage Hardware Snapshots 192 snapshot, the Mount Filesystem window appears, allowing you to select mount options. To mount a snapshot from the command line, type the following: mx fs mount To unmount a snapshot from the command line, type the following: mx fs unmount Copyright © 1999-2007 PolyServe, Inc. All rights reserved.
11 Matrix Operations on the Applications Tab The Applications tab on the Management Console shows all Matrix Server applications and resources configured in the matrix and enables you to manage and monitor them from a single screen. Applications Overview An application provides a way to group associated matrix resources (Virtual NFS Services and Export Groups, Matrix Server virtual hosts, service monitors, and device monitors) so that they can be treated as a unit.
Chapter 11: Matrix Operations on the Applications Tab 194 Virtual NFS Service or virtual host as the application name. Similarly, if you do not specify an application name for a device monitor, the application will use the same name as the monitor. The Applications Tab The Management Console lists applications and their associated resources (Virtual NFS Services and Export Groups, virtual hosts, service and device monitors) on the Applications tab.
Chapter 11: Matrix Operations on the Applications Tab 195 The servers on which the resources are configured appear in the columns. You can change the order of the server columns by dragging a column to another location. You can also resize the columns. The cells indicate whether a resource is deployed on a particular server, as well as the current status of the resource. If a cell is empty, the resource is not deployed on that server.
Chapter 11: Matrix Operations on the Applications Tab 196 The possible states for the application are: Icon Status OK Warning Error Meaning Clients can access the application. For example, an Export Group is healthy. All of its highavailability monitors are currently running and active on the associated nodes. All of the Virtual NFS Services associated with the Export Group are currently running and active on their primary interfaces. Clients can access the application but not from the primary node.
Chapter 11: Matrix Operations on the Applications Tab 197 Filter the Applications Display You can use filters to limit the information appearing on the Application tab. For example, you may want to see only a certain type of monitor, or only monitors that are down or disabled. You can use filters to do this. To add a filter, click the “Add Filter” tab and then configure the filter. Copyright © 1999-2007 PolyServe, Inc. All rights reserved.
Chapter 11: Matrix Operations on the Applications Tab 198 Name: Specify a name for this filter. On the Type tab shown above, select the types of virtual hosts, service monitors, device monitors, and solution-specific devices that you want to see. Click on the State tab to select specific states that you are interested in viewing. (The Applications tab will be updated immediately.) Copyright © 1999-2007 PolyServe, Inc. All rights reserved.
Chapter 11: Matrix Operations on the Applications Tab 199 Click OK to close the filter. The filter then appears as a separate tab and will be available to you when you connect to any cluster. To modify an existing filter, select that filter, right-click, and select Edit Filter. To remove a filter, select the filter, right-click, and select Delete Filter.
Chapter 11: Matrix Operations on the Applications Tab 200 When you reach a cell that accepts drops, the cursor will change to an arrow. The following drag and drop operations are allowed. Applications These operations are allowed only for applications that include at most only one virtual host. • Assign an application to a server. Drag the application from the Name column to the empty cell for the server.
Chapter 11: Matrix Operations on the Applications Tab 201 • Switch the primary and backup servers (or two backup servers) for a virtual host. Drag the virtual host from one server cell to the cell for the other server. If the virtual host is active, this operation can disconnect existing applications that depend on the virtual host. When the operation is complete, the ordering for failover will be switched. • Remove a virtual host from a server.
Chapter 11: Matrix Operations on the Applications Tab 202 reordered as necessary. If the monitor was multi-active, it will remain active on any other servers on which it is configured. (The device monitor cannot be removed via drag and drop if it is configured on only one server.) Menu Operations Applications The following operations affect all entities associated with a Matrix Server application.
Chapter 11: Matrix Operations on the Applications Tab 203 Virtual Hosts When you right-click on a virtual host, you can perform the same operations as are available on the Servers or Virtual Hosts tab. • Re-host, or move, the virtual host to another node in the matrix. • Add a service monitor. • Enable or disable the virtual host. • View or change the properties for the virtual host. • Delete the virtual host.
Chapter 11: Matrix Operations on the Applications Tab 204 • Add a new Export Group. • Rehost the Export Group. • Enable or disable the Export Group on all servers. • View all Export Groups. You can also administer Export Groups at the server level. In the server column, select the cell corresponding to the Export Group. Right-click to display the following options: • View properties for this Export Group, including status. • Delete this Export Group. • Enable or disable the Export Group on this server.
Chapter 11: Matrix Operations on the Applications Tab 205 For more information about using these procedures on service monitors, see “Add or Modify a Service Monitor” on page 225 and “Other Configuration Procedures” on page 233. For more information about using these procedures on device monitors, see “Add or Modify a Device Monitor” on page 240 and “Other Configuration Procedures” on page 253. Copyright © 1999-2007 PolyServe, Inc. All rights reserved.
12 Configure Virtual Hosts Matrix Server uses virtual hosts to provide failover protection for servers and network applications. Overview A virtual host is a hostname/IP address configured on a set of network interfaces. Each interface must be located on a different server. The first network interface configured is the primary interface for the virtual host. The server providing this interface is the primary server.
Chapter 12: Configure Virtual Hosts 207 Matrix Health and Virtual Host Failover To ensure the availability of a virtual host, Matrix Server monitors the health of the administrative network, the active network interface, and the underlying server. If you have created service or device monitors, those monitors periodically check the health of the specified services or devices.
Chapter 12: Configure Virtual Hosts 208 The failover operation to another network interface has minimal impact on clients. For example, if clients were downloading Web pages during the failover, they would receive a “transfer interrupted” message and could simply reload the Web page. If they were reading Web pages, they would not notice any interruption. If the active network interface fails, only the virtual hosts associated with that interface are failed over.
Chapter 12: Configure Virtual Hosts 209 Also note the following: • After creating virtual hosts, you will need to configure your applications to recognize them. • As of the 3.5.1 release, Matrix Server no longer uses the old aliases interface but instead uses the NETLINK facility to configure virtual hosts as secondary addresses. As a result, ipconfig(8) should no longer be used to query for Matrix Server virtual hosts. Instead, use ip(8) for the queries.
Chapter 12: Configure Virtual Hosts 210 Virtual Host: Type a hostname or an IP address for the virtual host. Application name: An “application” provides a way to group virtual hosts and related service and device monitors on the Applications tab. All of the Matrix Server resources associated with the application can then be treated as a unit. You can specify a new application name, select an existing application name, or leave this field blank.
Chapter 12: Configure Virtual Hosts 211 As part of configuring a virtual host, you will need to select network interfaces located on the servers that can be used for the virtual host. The interfaces are placed in order: primary, backup #1, backup #2, and so on. The ClusterPulse process considers the “health” of the servers providing those interfaces when determining where to place a virtual host.
Chapter 12: Configure Virtual Hosts 212 The first interface you select will be the primary interface. The other interfaces you select will be backups. The virtual host now appears on the Management Console. The following view is from the Virtual Hosts tab. To add or update a virtual host from the command line, use the following command. The first network interface specified is the primary interface and the additional interfaces are backups.
Chapter 12: Configure Virtual Hosts 213 Other Virtual Host Procedures Change the Virtual IP Address for a Virtual Host When you change the virtual IP address of a virtual host, you will also need to update your name server and to configure applications to recognize the new virtual IP address. The order in which you perform these tasks is dependent on your application and the requirements of your site. You can use mx commands to change the virtual IP address of a virtual host. Complete these steps: 1.
Chapter 12: Configure Virtual Hosts 214 When you make your changes and click OK, you will see a message warning that this action may cause a disruption of service. Your changes will occur when you confirm the update. To move a virtual host from the command line, use this command: mx vhost move [--policy autofailback|nofailback] [--application ] [--activitytype single|always] ([...
Chapter 12: Configure Virtual Hosts 215 Virtual Hosts and Failover When you create a virtual host, you specify a list of network interfaces on which the virtual host can be located. The interfaces are placed in order: primary, backup #1, backup #2, and so on. The ClusterPulse process considers the “health” of the servers providing those interfaces when determining where to place a virtual host.
Chapter 12: Configure Virtual Hosts 216 • A server is considered down if it loses coordinated communication with the matrix (for example, the server crashed or was shut down, Matrix Server was shut down on that server, the server failed to schedule a matrix group communication process for an extended period of time, the server disabled the NIC being used for matrix network traffic, and so on). • The PanPulse process controls whether a network interface is marked up or down.
Chapter 12: Configure Virtual Hosts 217 Customize Service and Device Monitors for Failover By default, when a service or device monitor probe fails, indicating that the watched service is down or the monitored device cannot be accessed, ClusterPulse will fail over the associated virtual host to another server where the monitored service or device is up. You can customize this behavior using the Advanced monitor settings.
Chapter 12: Configure Virtual Hosts 218 You can use the following Advanced settings to affect how ClusterPulse selects the network interface for failover. • The Event Severity setting allows you to specify whether ClusterPulse should consider the existence of monitor events (such as a script failure or timeout) when it chooses a network interface for failover. If the events are considered, the network interface for the affected server becomes less desirable.
Chapter 12: Configure Virtual Hosts 219 service monitor probe fails on node 1, the virtual host will fail over to node 2. Following are some possible failback scenarios: • When the monitored service is restored on node 1, the virtual host will remain on node 2. Node 1 and node 2 are equally healthy; they both have three up service monitors. The NOFAILBACK policy will not move the virtual host until a healthier server is available.
Chapter 12: Configure Virtual Hosts Virtual Host Policy 220 Monitor Probe Severity Behavior When Probe Reports DOWN AUTORECOVER Failover occurs. The virtual host remains on the backup server until a “healthier” server is available. NOAUTORECOVER Failover occurs and monitor is disabled on the original server. The virtual host remains on the backup server until a “healthier” server is available. Copyright © 1999-2007 PolyServe, Inc. All rights reserved.
13 Configure Service Monitors Service monitors are typically used to monitor a network service such as HTTP or FTP. If a service monitor indicates that a network service is not functioning properly on the primary server, Matrix Server can transfer the network traffic to a backup server that also provides that network service. Overview Before creating a service monitor for a particular service, you will need to configure that service on your servers.
Chapter 13: Configure Service Monitors 222 severity, Start scripts, and Stop scripts) are consistent across all servers configured for a virtual host. Service Monitors and Failover If a monitored service fails, Matrix Server attempts to relocate any virtual hosts associated with the service monitor to a network interface on a healthier server.
Chapter 13: Configure Service Monitors 223 FTP Service Monitor By default the FTP service monitor probes TCP port 21 of the virtual host address. You can change this port number to the port number configured for your FTP server. The default frequency of the probe is every 30 seconds. The default time that the service monitor waits for a probe to complete is five seconds. The probe function attempts to connect to port 21 and expects to read an initial message from the FTP server.
Chapter 13: Configure Service Monitors 224 server. If there are no errors, the service status remains Up. If an error occurs, the status is set to Down. TCP Service Monitor The generic TCP service monitor defaults to TCP port 0. You should set the port to the listening port of your server software. The default frequency of the probe is every 30 seconds. The default time that the service monitor waits for a probe to complete is five seconds.
Chapter 13: Configure Service Monitors 225 Add or Modify a Service Monitor Adding a service monitor configures Matrix Server monitoring only. It does not configure the service itself. To add or update a service monitor, select the appropriate option: • To add a new service monitor, first select the virtual host for the monitor on either the Servers or Virtual Hosts window, then rightclick and select Add Service Monitor (or click the Service icon on the toolbar).
Chapter 13: Configure Service Monitors 226 Timeout: The maximum amount of time that the monitor_agent daemon will wait for a probe to complete. For most monitors, the default timeout interval is five seconds. You can use the default setting or specify a new timeout interval. Frequency: The interval of time, in seconds, at which the monitor probes the designated service. You can use the default setting, typically 30 seconds, or enter a new frequency interval.
Chapter 13: Configure Service Monitors 227 To add or update a service monitor from the command line, use this command: mx service add|update [--type ] [--timeout ] [--frequency ] [] ... NOTE: The --type argument cannot be used with mx service update. See “Advanced Settings for Service Monitors” for information about the other arguments that can be specified for service monitors.
Chapter 13: Configure Service Monitors 228 Timeout and Failure Severity This setting works with the virtual host policy (either AUTOFAILBACK or NOFAILBACK) to determine what happens when a probe of a monitored service fails. The default policies (AUTOFAILBACK for the virtual host and AUTORECOVERY for the monitor) cause ClusterPulse to fail over the associated virtual host to a backup network interface on another server.
Chapter 13: Configure Service Monitors 229 NOAUTORECOVER. The virtual host fails over when a monitor probe fails and the monitor is disabled on the original node, preventing automatic failback. When the monitor is reenabled, failback occurs according to the virtual host’s failback policy. The NOAUTORECOVER option is useful when integrating Matrix Server with a custom application where certain application-specific actions must be taken before the failback can occur.
Chapter 13: Configure Service Monitors 230 takes place on that node. The monitor instances on other nodes are marked as “standby” on the Management Console. If the virtual host fails over to a backup node, the monitor instance on the original node becomes inactive and the probe is no longer run on that node. Matrix Server activates the virtual host on the new node, which causes the monitor instance on that node to change status from “standby” to “active.
Chapter 13: Configure Service Monitors 231 Scripts Service monitors can optionally be configured with scripts that are run at various points during matrix operation. The script types are as follows: Recovery script. Runs after a monitor probe failure is detected, in an attempt to restore the service. Start script. Runs as a service is becoming active on a server. Stop script. Runs as a service is becoming inactive on a server.
Chapter 13: Configure Service Monitors 232 can create a custom Start or Stop script for the action and specify it on the Scripts tab for the monitor. The default order for starting is: • Run the monitor’s starting activities (if any) • Run the custom Start script (if any) If you want to reverse this order, preface the Start script with the prefix [pre] on the Scripts tab.
Chapter 13: Configure Service Monitors 233 --scriptSeverity consider|ignore Script Ordering Script ordering determines the order in which Matrix Server runs Start and Stop scripts when a virtual host moves from one server to another. If you do not configure a monitor with Start and Stop scripts, the script ordering configuration has no effect. There are two settings: SERIAL. This is the default setting.
Chapter 13: Configure Service Monitors 234 Delete a Service Monitor Select the service monitor to be deleted, right-click, and select Delete. To delete a service monitor from the command line, use this command: mx service delete Disable a Service Monitor on a Specific Server Select the service monitor to be disabled, right-click, and select Disable. To disable a service monitor from the command line, use this command: mx service disable ...
Chapter 13: Configure Service Monitors 235 Clear Service Monitor Errors Select the service monitor where the event occurred, right-click, and select Clear Last Event. To clear a monitor event from the command line, use this command: mx service clear ... Copyright © 1999-2007 PolyServe, Inc. All rights reserved.
14 Configure Device Monitors PolyServe Matrix Server provides built-in device monitors that can be used to watch disk devices or to monitor the status of PSFS filesystems. You can also create custom device monitors. Overview Matrix Server provides the following types of device monitors. To configure a device monitor, you will need to specify the probe timeout and frequency and a monitor-specific value.
Chapter 14: Configure Device Monitors 237 (“Associated” service and device monitors are those monitors that are associated with the same virtual host as this device monitor.) • Single-Always-Active. The monitor is active on only one of the selected servers. Upon server failure, the monitor will fail over to an active server even if all associated service and device monitors are down.
Chapter 14: Configure Device Monitors 238 When you create a SHARED_FILESYSTEM device monitor, be sure to configure the following advanced options: • Virtual hosts. Select any virtual hosts that should fail over if the monitor probe reports a DOWN status for the filesystem. • Servers. Select all servers that have mounted the filesystem and are running the applications associated with the virtual hosts. You might also want to create Start, Stop, or Recovery scripts to customize the behavior of the monitor.
Chapter 14: Configure Device Monitors 239 enablement of the servers when determining where to activate a device monitor. Device Monitor Activeness Policy ClusterPulse uses the following device monitor activeness policy to determine the server or servers where it will make a device monitor active. The policy described here is accurate for this release but it may change in future releases or be modified by other PolyServe products. The device monitor activeness policy decision is made as follows: 1.
Chapter 14: Configure Device Monitors 240 5. If there are no servers with completely healthy services, ClusterPulse picks a server that has at least one service up and enabled. If ClusterPulse finds a server meeting these conditions, it will use it, preferring services earlier in the list of servers for this device monitor. 6.
Chapter 14: Configure Device Monitors 241 Device name: Type the name of the device monitor. You can use up to 32 alphanumeric characters. Application name: Specify the name of the Matrix Server application to be associated with this device monitor. Matrix Server applications are used to group related virtual hosts, service monitors, and device monitors on the Applications tab. If you leave this field blank, Matrix Server will use the name of the device monitor as the application name.
Chapter 14: Configure Device Monitors 242 decimal IP address of the hostname for the server, and is the name assigned to the SHARED_FILESYSTEM device monitor. The following example shows a monitor created on the server r18. To add a device monitor from the command line, use this command: mx device add --type --servers ,,... [--application ] [--timeout ] [--frequency ] [--parameters ] [otherarguments] ...
Chapter 14: Configure Device Monitors 243 Probe Severity The Probe Severity tab lets you specify the failover behavior of the device monitor. The Probe Severity setting works with the virtual host policy (either AUTOFAILBACK or NOFAILBACK) to determine what happens when a monitored device fails.
Chapter 14: Configure Device Monitors 244 NOFAILOVER. When the monitor probe fails, ClusterPulse does not fail over to a backup network interface. This option is useful when the monitored resource is not critical, but is important enough that you want to keep a record of its health. AUTORECOVER. This is the default. The virtual host fails over when a monitor probe fails. When device access is recovered on the original node, failback occurs according to the virtual host’s failback policy. NOAUTORECOVER.
Chapter 14: Configure Device Monitors 245 Custom Scripts The Scripts tab lets you configure custom Recovery, Start, and Stop scripts for a device monitor. Device monitors can optionally be configured with scripts that are run at various points during matrix operation. The script types are as follows: Recovery script. Runs after a monitor probe failure is detected, in an attempt to restore the device. Start script. Runs as a device is becoming active on a server. Stop script.
Chapter 14: Configure Device Monitors 246 Start scripts must be robust enough to run when the device is already started, without considering this to be an error. Similarly, Stop scripts must be robust enough to run when the device is already stopped, without considering this to be an error. In both of these cases, the script should exit with a zero exit status.
Chapter 14: Configure Device Monitors 247 If you want to reverse this order, preface the Stop script with the prefix [post] on the Scripts tab. Event Severity By default, Matrix Server treats the failure or timeout of a Start or Stop script as a failure of the associated monitored device and may initiate failover of the associated virtual hosts. Configuration errors can also cause this behavior.
Chapter 14: Configure Device Monitors 248 1. Matrix Server runs the Stop script on all servers where the shared device or virtual host should be inactive. 2. Matrix Server waits for all Stop scripts to complete. 3. Matrix Server runs the Start script on the server where the virtual host or shared device is becoming active. The SERIAL setting considers events and takes precedence over the setting for Event Severity. PARALLEL.
Chapter 14: Configure Device Monitors 249 When a device monitor detects a failure, Matrix Server attempts to fail over the active virtual hosts associated with that device monitor. By default, all virtual hosts on the servers used with the device monitor are dependent on the device monitor. However, you can specify that only certain virtual hosts be dependent on the device monitor. For example, you might have a DISK monitor for a disk containing Web and FTP files.
Chapter 14: Configure Device Monitors 250 Servers for Device Monitors The Servers tab allows you to select the servers on which the device monitor will be configured. You can also set some options related to the monitor probe operation and failover. Probe Type. The servers on which the monitor probe will occur. Select Single-Probe to conduct the probe only on the server where the monitor is active. Select Multi-Probe to conduct the probe on all servers configured for the monitor. Activity Type.
Chapter 14: Configure Device Monitors 251 down. (“Associated” service and device monitors are those monitors that are associated with the same virtual host as this device monitor.) • Multi-Active. The monitor is active simultaneously on all selected servers. This option sets the probe type to Multi-Probe. The Selected Servers column will not display P or B (for primary or backup). Available Servers/Selected Servers.
Chapter 14: Configure Device Monitors 252 Set a Global Event Delay A device monitor that is configured to be multi-active or to probe on multiple servers can experience a global event, in which the shared resource being monitored is reported to be down on all servers. When the shared resource becomes active again, the monitor probe on each server will report that the resource is up.
Chapter 14: Configure Device Monitors 253 Delay: Type the number of seconds that the device monitor should wait before failing over virtual hosts following a global event. The default is 65 seconds. To determine the number of seconds for the delay, check the probe frequency and probe timeout values of the shared resource monitors in your configuration. Using the monitor with the largest values, add together the number of seconds specified for the probe frequency and probe timeout.
Chapter 14: Configure Device Monitors 254 To enable a device monitor from the command line, use this command: mx device enable ... View Device Monitor Errors To view the last error that occurred on a device monitor, select that monitor, right-click, and select View Last Error. Clear Device Monitor Errors To clear a error from a device monitor, select that monitor, right-click, and select Clear Last Event.
15 Configure Notifiers If you would like certain actions to take place when matrix events occur, you can configure notifiers that define how the events should be handled. Overview Matrix Server uses notifiers to enable you to view event information generated by servers, network interfaces, virtual hosts, service monitors, device monitors, and filesystems. Notifiers send events from these entities to user-defined notifier scripts.
Chapter 15: Configure Notifiers 256 the event and entity combination. An event causes the script, which has its standard input wired to a pipe from the notifier_agent, to be run. The notifier script will be run with any arguments that you included in the script string. The script may read STDIN to accept the event message.
Chapter 15: Configure Notifiers 257 Name: Enter a name for the notifier. You can use up to 32 alphanumeric characters. Script: Enter the name of the script that will be run when an event occurs. Event: Check the events for which you want to receive notification. Entity: Check the entities for which you want to receive notification. The USER1 - USER7 entities are user-defined entities for the mxlogger command. See “Add Your Own Messages to the Matrix Log File” on page 317.
Chapter 15: Configure Notifiers 258 Enable a Notifier Select the notifier to be enabled from the Notifiers window, right-click, and select Enable. To enable a notifier from the command line, use this command: mx notifier enable ... Test a Notifier Select the notifier to be tested from the Notifiers window, right-click, and select Test. The event messages for each configured entity will now be sent to the notifier.
Chapter 15: Configure Notifiers 259 Sample Notifier Messages Following is an example of a notifier message: 10.10.1.1 State VHOSTS 130 Oct 31 2000 13:13:00 Virtual host change - 10.1.1.1 now active on 10.10.1.1 The Test Notifier option causes a test event to be generated for each of the event/entity combinations that you configure for the notifier. Following is an example: 10.10.1.
16 Performance Monitoring Matrix Server includes a Performance Dashboard that you can use to monitor the following: • Average CPU utilization • Average committed physical memory utilization • Average swap memory utilization • Total PSFS filesystem I/O transfer rate • Total PSFS filesystem I/Os per second • Average one-minute run-queue depth View the Performance Dashboard The Performance Dashboard can report performance information for either all servers in the matrix or a specific server.
Chapter 16: Performance Monitoring 261 The display includes six performance counters. Each counter shows the aggregate value of that counter for all of the servers in the matrix. For example, the average CPU utilization counter shows the average for the CPUs on all of the servers. The status panel at the bottom of the display includes a timestamp showing when the last set of data was received. A new sample is taken every five seconds.
Chapter 16: Performance Monitoring 262 NOTE: The aggregate value is scheduled to be calculated every five seconds; however, the calculation may be delayed. For example, depending on the activity on the system, the aggregate value could be calculated every seven seconds. As a result, the aggregate value on the dashboard might not match the sum of the counter values on the servers at some report points, as the counter values are updated every five seconds.
Chapter 16: Performance Monitoring 263 The bar in the center of the display shows the current, minimum, maximum, and average value for the instance of the counter that is highlighted on the bottom of the window (the aggregate instance in this example). The values are calculated based on the samples that are visible on the graph, not over the time that has elapsed since monitoring was started. The Duration field specifies the maximum amount of time that can be displayed on the graph (250 seconds).
Chapter 16: Performance Monitoring 264 Display the Dashboard for One Server You can also display the Performance Dashboard for a specific server. Select the server on the Management Console, right-click, and select Matrix Performance Dashboard for Server. The values displayed are just for the selected server. Copyright © 1999-2007 PolyServe, Inc. All rights reserved.
Chapter 16: Performance Monitoring File Serving Performance Dashboard The File Serving Performance Dashboard provides performance information for Export Groups. This Dashboard works in the same manner as the Matrix Performance Dashboard. To display the File Serving Performance Dashboard, select the Export Group on the Management Console, right-click, and select File Serving Dashboard. Copyright © 1999-2007 PolyServe, Inc. All rights reserved.
Chapter 16: Performance Monitoring 266 The File Serving Performance Dashboard shows the aggregate value for each counter. Click the Detail button to see information for each individual instance. You can also display the File Serving Performance Dashboard for a particular server. Select the Export Group for that server, right-click, and select File Serving Dashboard for Server.
Chapter 16: Performance Monitoring 267 The arguments are: • --servers ALL_SERVERS | The server to be monitored. The default is all servers. • --datasets | UNLIMITED The number of datasets to be returned. The default is one. After the specified number of data sets are returned, the Dashboard will terminate. • --noHeaders Do not output column headers. • --csv Generate the report in a comma-separated format.
Chapter 16: Performance Monitoring 268 Average swap memory utilization (%). This counter reports the percentage of allocated swap space that is currently in use by the kernel’s virtual memory subsystem. A high number for this counter could indicate that you need more memory. It might also mean that you need a bigger swap file, as the counter is just reporting a percentage.
Chapter 16: Performance Monitoring 269 Total NFS connections. The approximate number of NFS clients that have been recently active. The number is based on the number of NFS clients that exist in the authorization cache. This monitor can be used to check load balancing of the NFS clients across the servers in the matrix. The File Serving Performance Dashboard shows the aggregate value for each counter. Click the Detail button to see information for each individual instance.
17 Advanced Monitor Topics The topics described here provide technical details about Matrix Server monitor operations. This information is not required to use Matrix Server in typical configurations; however, it may be useful if you want to design custom scripts and monitors, to integrate Matrix Server with custom applications, or to diagnose complex configuration problems.
Chapter 17: Advanced Monitor Topics 271 • For a service monitor, the file must be installed on each server associated with the virtual host on which the service monitor is located. • For a device monitor, the file must be installed on each server that is configured with the virtual hosts associated with the device monitor. • The command termination exit status is used to signal script success or failure. A 0 (zero) exit status indicates script success; any other exit status indicates script failure.
Chapter 17: Advanced Monitor Topics 272 When the monitor executes the testpid script, it will first determine whether the /var/run/application/pid file exists. If the file does not exist, the script exits with a non-zero exit status, which the monitor interprets as a failure. If the file does exist, the script reads the pid from the file into the variable pid. The kill command then determines whether the pid is running. The exit status of the kill command is the exit status of the script.
Chapter 17: Advanced Monitor Topics 273 Recovery script to reduce the frequency of failovers. The script could contain the following line: /etc/rc.d/init.d/myservice restart When you add a recovery script to a service or device monitor, you can set a timeout period, which is the maximum amount of time that the monitor_agent daemon will wait for the Recovery script to complete.
Chapter 17: Advanced Monitor Topics 274 exit non-zero. The service could then become active on another server, causing the Stop script to run on the original server even though the Start script had not completed successfully. When you add Start and Stop scripts to a service or device monitor, you can set a timeout period for each script. Script Environment Variables When you specify a script for a service or device monitor, Matrix Server sets the following environment variables for that script.
Chapter 17: Advanced Monitor Topics 275 Matrix Server does not set any other variables. If a script requires a variable such as a pathname, the script will need to set it. The Effect of Monitors on Virtual Host Failover Typically a virtual host has a primary network interface and one or more backup network interfaces. On the servers supplying the interfaces, the state of the virtual host is either active or inactive.
Chapter 17: Advanced Monitor Topics 276 The first example shows the state transitions that occur at startup from an unknown state. At i1, all instances of the monitor have completed stopping. At i2, the virtual host is configured on the Primary. At i3, the monitor start script begins on the Primary and probing begins on the backups. At i4, probing begins on the Primary.
Chapter 17: Advanced Monitor Topics 277 At i5 in the following example, the probe fails on the Primary. At i6, the virtual host is deconfigured on the Primary. At i7, the monitor stop script begins on the Primary. At i8, the virtual host is configured on the second backup. At i9, the monitor start script begins on the second backup. At i10, probing begins on the second backup.
Chapter 17: Advanced Monitor Topics 278 A custom device monitor also has an activity status on each server. This status indicates the current activity of the monitor on the server. The status can be one of the following: Starting, Active, Suspended, Stopping, Inactive, Failure.
Chapter 17: Advanced Monitor Topics 279 Time Primary t1 active Vhost status inactive Service probe status unknown Service monitor activity active undefined star ting Device probe status unknown Device monitor activity active undefined star ting up inactive down inactive stopping up First Bac kup Vhost status inactive Service probe status unknown Service monitor activity undefined up inactive stopping Device probe status Device monitor activity Sec ond Bac kup Vhost status unknown
Chapter 17: Advanced Monitor Topics 280 Integrate Custom Applications There are many ways to integrate custom applications with Matrix Server: • Use service monitors or device monitors to monitor the application • Use a predefined monitor or your own user-defined monitor • Use Start, Stop, and Recovery scripts Following are some examples of these strategies.
Chapter 17: Advanced Monitor Topics 281 Built-In Monitor or User-Defined Monitor? To decide whether to use a built-in monitor or a user-defined monitor, first determine whether a built-in monitor is available for the service you want to monitor and then consider the degree of content verification that you need.
Chapter 17: Advanced Monitor Topics 282 and then create a CUSTOM service monitor, specifying the path of the script as the “user probe script” parameter. This provides not only verification of the connection, but a degree of content verification. The CUSTOM monitor can also include Start and Stop scripts. Suppose the myservice application caches transactions induced by requests from external users for later commitment to a back-end database server.
18 SAN Maintenance The following information and procedures apply to SANs used with Matrix Server. Server Access to the SAN When a server is either added to the matrix or rebooted, Matrix Server needs to take some administrative actions to make the server a full member of the matrix with access to the shared filesystems on the SAN. During this time, the Management Console reports the message “Joining matrix” for the server.
Chapter 18: SAN Maintenance 284 The Management Console typically displays an alert message when a server loses access to the SAN. (See Appendix B for more information about these messages.) Membership Partitions Matrix Server uses a set of membership partitions to control access to the SAN and to store the device naming database, which includes the global device names for SAN disks imported into the matrix. Typically, the membership partitions are created when you install Matrix Server.
Chapter 18: SAN Maintenance 285 Following is some sample output. The command was issued on host 99.10.30.3. The SDMP administrator is the administrator for the matrix to which the host belongs. There are three membership partitions. # mxsanlk This host: 99.10.30.3 This host’s SDMP administrator: 99.10.30.
Chapter 18: SAN Maintenance 286 • trying to lock, not yet committed by owner The SANlock is either not held or has not yet been committed by its holder. The host on which mxsanlk was run is trying to acquire the SANlock. • unlocked, trying to lock The SANlock does not appear to be held. The host on which mxsanlk was run is trying to acquire the SANlock. • unlocked The SANlock does not appear to be held. If a host holds the SANlock, it has not yet committed its hold.
Chapter 18: SAN Maintenance 287 • locked (lock is corrupt, will repair) The host on which mxsanlk was run holds the lock. The SANlock was corrupted but will be repaired. If a membership partition cannot be accessed, use the mxmpconf program to correct the problem. When you invoke mxsanlk, it checks for the Storage Device Monitor Pulse (SDMP) daemon. This daemon is responsible for grabbing and maintaining the locks on the membership partitions.
Chapter 18: SAN Maintenance 288 The mxmpconf utility starts an ASCII interface that you can use to create a new set of membership partitions or to repair the existing partitions. NOTE: Matrix Server cannot be running when you use mxmpconf. To stop the matrix, use the following command: # /etc/init.d/pmxs stop After stopping Matrix Server, type mxmpconf at the operating system prompt. The Main Menu is then displayed.
Chapter 18: SAN Maintenance 289 Maintain Membership Partitions with the Repair Option The Repair Menu allows you to view the membership partition configuration and to perform several maintenance activities. The Repair Menu lists the current membership partitions according to the membership file maintained on the server where you are running the utility. Each server in the matrix has a membership partition file, which is called the “local MP list.
Chapter 18: SAN Maintenance 290 If the status is NOT FOUND or INACCESSIBLE, there may be a problem with the disk or with another SAN component. When the problem is repaired, the status should return to OK. If the status is CORRUPT, you should resilver the partition. This step copies the membership data from a valid membership partition to the corrupted partition. NOTE: The membership partition may have become corrupt because it was used by another application.
Chapter 18: SAN Maintenance 291 Sizes for Membership Partitions Matrix Server stores the size of the smallest membership partition that was created during the Matrix Server configuration. When you add or replace a membership partition, the new partition must be at least as large as that original partition. For example, if you originally created 2-GB, 3-GB, and 4-GB membership partitions, any membership partitions created later on must be at least 2 GB in size.
Chapter 18: SAN Maintenance 292 Search the SAN for Membership Partitions. The Search option searches the SAN for all partitions that appear to be membership partitions. You can also copy this data to a file. Copyright © 1999-2007 PolyServe, Inc. All rights reserved.
Chapter 18: SAN Maintenance 293 The output includes each membership partition found by the search, whether the partition is active or inactive, the membership list on the disk containing the partition, and the database records for the partitions. Resilver Membership Partitions. Typically, Matrix Server writes data to one membership partition and then copies, or resilvers, that data to the other membership partitions.
Chapter 18: SAN Maintenance 294 partitions on that disk are displayed. To select a partition, move to that partition and press the spacebar. You can use the Search option on the Repair menu to locate a valid membership partition. The resilver operation synchronizes all other membership partitions and the local membership partition list. Remove a Membership Partition. The Remove option allows you to remove an existing membership partition.
Chapter 18: SAN Maintenance 295 After you select the partition to be removed, you will be asked to select the SAN disk containing the replacement partition. The partitions on that disk are then displayed. To select a partition, move to that partition and press the spacebar. When you choose the new partition, the local path to that partition will appear at the bottom of the window. Select Done to complete the operation. Add a Membership Partition.
Chapter 18: SAN Maintenance 296 The Add option asks you to select the SAN disk containing the new partition. The partitions on that disk are then displayed. To select a partition, move to that partition and press the spacebar. (The minimum size for a membership partition is 100 MB.) The local path to that partition then appears at the bottom of the window. Select Done to complete the operation.
Chapter 18: SAN Maintenance 297 Clear the Host Registry. This option removes all entries from the server registry. It should be used only under the direction of PolyServe Technical Support. CAUTION: Before clearing the server registry, be sure to reboot or power off any servers that were previously removed from the matrix and no longer have access to the SAN. After the servers have been rebooted, they can safely access the SAN.
Chapter 18: SAN Maintenance 298 When you select a disk, the partitions on that disk are displayed. To select a partition, move to that partition and press the spacebar. Information about the partition then appears at the bottom of the window. A 100-MB partition is adequate. The partition you selected is displayed on the Membership Partition Setup window. If you want to use three membership partitions, repeat this procedure to select the additional membership partitions.
Chapter 18: SAN Maintenance 299 Increase the Membership Partition Timeout Under heavy I/O load, I/O timeouts can occur on membership partition accesses. The I/O timeouts are reported as ʺSCSI error : <...> return code = 50000ʺ in the file /var/log/messages. The I/O timeouts can cause problems such as the following: • Excessive path switching. • Filesystems appearing to be hung when a node crashes.
Chapter 18: SAN Maintenance 300 A setting of 10,000ms will be adequate for many storage arrays. If you are using DDN storage, you may need to increase the parameter beyond 10,000ms. NOTE: If you increased the parameter and are still experiencing I/O timeouts, a different issue may be causing the problem. Contact HP Technical Support for assistance. Servers Change the Fencing Method This procedure describes how to change the matrix fencing method on servers that are already running Matrix Server. 1.
Chapter 18: SAN Maintenance 301 Server Cannot Be Located If the matrix reports that it cannot locate a server on the SAN but you know that the server is connected, there may be an FC switch problem. For example, on a Brocade FC switch, log into the switch and verify that all F-Port and L-Port IDs specified in switchshow also appear in the local nameserver, nsshow. If the lists of ports are different, reboot the switch. If the reboot does not clear the problem, there may be a problem with the switch.
Chapter 18: SAN Maintenance 302 event that it cannot be fenced and cannot be rebooted. IMPORTANT: This utility must be run only after the server has been physically verified to be down. If the server is not down, running this utility could result in filesystem corruption. Do you wish to continue? y SUCCESS 99.10.20.4 has been marked as down. CAUTION: Be sure to verify that the server is physically down before running the mx server markdown command.
Chapter 18: SAN Maintenance 303 vendor-supplied libhbaapi. (Check with your vendors to determine whether OLI is supported.) When this lower-level OLI support is in place, inserting a new disk will cause a new device to automatically become eligible for importing. The disk can be imported with the Management Console or mx utility, and can be used normally from that point forward. If OLI is not possible with your hardware combination, you will need to restart Matrix Server after inserting a new disk.
Chapter 18: SAN Maintenance 304 Host Bus Adapters (HBAs) Reduce the HBA Queue Depth The HBA queue depth is the maximum number of outstanding I/O requests that the HBA can hold while it is waiting for responses from the LUNs on the storage array.
Chapter 18: SAN Maintenance 305 Change the HBA Queue Depth To change the HBA queue depth, you will need to edit the /etc/opt/polyserve/fc_pcitable file. In the file, locate the line for your HBA driver. The following examples are for QLogic 2300 and Emulex 900 drivers. #0x1077 0x2300 qla2300 scsi QLogic 2300 Adapter w/v6.01 driver #0x10df 0xf900 lpfcdd scsi Emulex Lightpulse 9000 FibreChannel Adapter Edit the line as follows: • Remove the comment character (#) from the beginning of the line.
Chapter 18: SAN Maintenance 306 Replace an HBA Card To replace an HBA card, complete the following steps: 1. Shut down the Matrix Server software: # /etc/init.d/pmxs stop 2. Disable the Matrix Server startup script. Type this command: # /sbin/chkconfig --del pmxs 3. Shut down the server: # init 0 4. Swap out the HBA card. 5. Power up the server. 6. Configure Matrix Serverʹs default HBA driver version for the hardware installed on your system: # /opt/polyserve/lib/chhbadriver default 7.
Chapter 18: SAN Maintenance 307 To install a non-default HBA driver version, complete these steps: 1. Shut down the Matrix Server software: # /etc/init.d/pmxs stop 2. Disable the Matrix Server startup script. Type this command: # /sbin/chkconfig --del pmxs 3. Reboot the server. 4. Use the following command to display a list of the HBA driver versions provided with Matrix Server: # /opt/polyserve/lib/chhbadriver list 5. Run the chhbadriver command again with the appropriate driver version. 6.
Chapter 18: SAN Maintenance 308 1. Stop Matrix Server if it is currently running: # /etc/init.d/pmxs stop 2. Disable the Matrix Server startup script. Type this command: # /sbin/chkconfig --del pmxs 3. Reboot the server. 4. Run the lspci(8) command to determine the vendor ID and device ID of the FC host bus adapters on the server. 5. Update the fc_pcitable file with information about the driver that you installed. The beginning of the file describes the syntax of the entries in the file.
Chapter 18: SAN Maintenance 309 Format of the fc_pcitable File The fc_pcitable file contains entries only for the drivers installed on your system. By default, the entries in the file are commented out, as indicated by the comment character (#). The file is used only if you add a new entry to the file or modify a default entry (by removing the comment character and then changing the appropriate values).
Chapter 18: SAN Maintenance # # # # # # # # 310 HBA API libraries are located in /usr/lib. Emulex libraries are called libHBAAPI.so, libemulexhbaapi.so and libdfc.so and are included with the HBAnywhere package, included with the HBA driver that can be found on the Emulex web site. QLogic HBA API libraries are called libHBAAPI.so and libqlsdm.so. These are included in the source for the HBA driver that can be found at the QLogic web site.
Chapter 18: SAN Maintenance 311 Online Replacement of a FibreChannel Switch When a matrix includes multiple FibreChannel switches, you can replace a switch without affecting normal matrix operations. The following conditions must be met when performing online replacement of a FibreChannel switch: • The replacement switch must be the same model as the original switch and must have the same number of ports. • The FC connectors must be reinserted in the same location on the new switch.
Chapter 18: SAN Maintenance 312 5. Connect the power and either the Ethernet or the serial console cable to the new switch. 6. Log on to the new switch. 7. Disable the switch with the switchDisable command. 8. Disable any stale active configuration on the new switch with the cfgDisable command. 9. Verify that the Brocade licenses are installed by using the licenseShow command. The new switch should have the same kind of license as the rest of the fabric. 10.
Chapter 18: SAN Maintenance 313 17. Verify that the new switch has been configured into the matrix. Run the /opt/polyserve/sbin/mxmpio status command, which shows whether failover is enabled or disabled. (See mxmpio(8) for details.) 18. Verify that I/O operations are successful via the new switch. Mount a psd device, and then use mxmpio to set its active path to one of the paths that goes through the new switch. Then perform I/O operations such as creating or deleting files on the mounted psd device.
Chapter 18: SAN Maintenance 314 5. Make the switch operating mode and domain ID acceptable to the original fabric. This can be done either by consulting the fabric or by taking the values from the data saved in step 2. This procedure might include changing the default zone setting as directed by EWS when changing interoperation mode. Any existing zone configuration on the new switch should be removed to allow the fabric to properly communicate current zoning when the switch joins the fabric. 6.
19 Other Matrix Maintenance This chapter describes how to maintain the Matrix Server log files and how to collect them for analysis by PolyServe Technical Support. It also describes maintenance and troubleshooting procedures for servers and monitors. Maintain Log Files Matrix Server stores its log files in the /var/log/polyserve directory on each server in the matrix. These log files are typically used to record the actions of the Matrix Server daemons and agents. The matrix.
Chapter 19: Other Matrix Maintenance 316 Read the Matrix Log File Select the server where you want to view the log, right-click, and then select View Log. The Server Log window displays the most recent messages from the matrix.log file. You can select the types of messages that you want to view by checking or unchecking the boxes at the top of the window. Use the scroll bars to move up, down, left, and right in the file, allowing you to see entire messages without resizing the window.
Chapter 19: Other Matrix Maintenance 317 files are named matrix.log.1, matrix.log.2, and so on. You can also rotate the matrix log file from the Management Console. Select the appropriate server on the Servers window, right-click, and select Rotate Log. The Management Console will display an error message if the rotate option fails. Add Your Own Messages to the Matrix Log File You can use the mxlogger command to add your own messages to the matrix.log file.
Chapter 19: Other Matrix Maintenance 318 • Commands invoked via the PolyServe Management Console • mx commands, with the exception of status commands The audit entry contains the IP address and port number of the client TCP connection. It also includes the user name used to log into the Management Console or invoke the mx command. Login authentications are recorded as either “success” or “failure.” The results of the Management Console and mx commands are not recorded.
Chapter 19: Other Matrix Maintenance 319 The /tmp directory is used to hold files temporarily while mxcollect is running. The final output from mxcollect is created in the directory from which you run mxcollect. In this example, we will run the utility from the /tmp directory. [root@venus1 /]# cd /tmp [root@venus1 tmp]# /opt/polyserve/tools/mxcollect This utility should only be run from a server with PolyServe Matrix Server installed.
Chapter 19: Other Matrix Maintenance [root@venus1 tmp]# ls 7659 matrixinfo hsperfdata_root mxcollect_20060926_161258.tgz ksocket-root _4E123592 [root@venus1 tmp]# 320 _4E123593 _4E124578 _4E124608 Upload mxcollect Files to Technical Support After running mxcollect, you can upload the resulting files to HP Support. Contact HP Support for more information. Matrix Alerts The Alerts section at the bottom of the Management Console window lists errors that have occurred in matrix operations.
Chapter 19: Other Matrix Maintenance 321 Troubleshoot Matrix Problems The following situations do not produce specific error messages. The Server Status Is “Down” If a server is running but Matrix Server shows it as down, follow these diagnostic steps: 1. Verify that the server is connected to the network. 2. Verify that the network devices and interfaces are properly configured on the server. 3. Ensure that the ClusterPulse daemon is running on the server. 4.
Chapter 19: Other Matrix Maintenance 322 Matrix Server Exits Immediately If the ClusterPulse daemon exits immediately on starting, check the last lines of the following files for errors: • /var/log/polyserve/matrix.log • /var/log/polyserve/mxinit.log This problem typically occurs because either the hostname is not set properly on the server or the main Ethernet interface is not installed. Refer to the ifconfig man page for ways to check this.
Chapter 19: Other Matrix Maintenance 323 monitor was not found, the HTTP service monitor will be reported as Down. “Undefined” Status If the probe has not completed because of a script configuration problem or because Matrix Server is still attempting to finish the first probe, the status will be reported as “undefined” instead of Down. “SYSTEM ERROR” Status The “SYSTEM ERROR” status indicates that a serious system functional error occurred while Matrix Server was trying to probe the service.
Chapter 19: Other Matrix Maintenance 324 START_TIMEOUT. A Start script was executed but it did not complete within the specified timeout period. STOP_TIMEOUT. A Stop script was executed but it did not complete within the specified timeout period. RECOV_TIMEOUT. A Recovery script was executed but it did not complete within the specified timeout period. START_FAILURE. A Start script was executed but it returned a non-zero exit status. STOP_FAILURE.
Chapter 19: Other Matrix Maintenance 325 Because the error is server-specific, you must clear it on each server in the matrix (just as you had to correct the script on each server that reported a problem). NOTE: An error on a monitor may still be indicated after correcting the problem with the Start, Stop, Recovery, or probe script. Errors can be cleared only with the Management Console or the appropriate mx command. An error will not be automatically cleared by the ClusterPulse daemon.
Chapter 19: Other Matrix Maintenance 326 “Activity Unknown” Status For a brief period while the monitor_agent daemon checks the monitor script configuration and creates a thread to serve the monitor, the activity may be displayed as “activity unknown.” “Transitioning” Activity The “Transitioning” activity indicates that the monitor state is on its way to becoming ACTIVE or INACTIVE (or starting or stopping, if a Start or Stop script is present).
Chapter 19: Other Matrix Maintenance 327 service or device to fail periodically and you do not want to take the failover action for a single probe failure. Putting a script like this in place essentially implements a “two consecutive probe-script failure” probe. Copyright © 1999-2007 PolyServe, Inc. All rights reserved.
A Management Console Icons The Management Console uses the following icons. Matrix Server Entities The following icons represent the Matrix Server entities. If an entity is disabled, the color of the icon becomes less intense. Copyright © 1999-2007 PolyServe, Inc. All rights reserved.
Appendix A: Management Console Icons 329 Additional icons are added to the entity icon to indicate the status of the entity. The following example shows the status icons for the server entity. The status icons are the same for all entities and have the following meanings. Monitor Probe Status The following icons indicate the status of service monitor and device monitor probes. If the monitor is disabled, the color of the icons is less intense. Copyright © 1999-2007 PolyServe, Inc. All rights reserved.
Appendix A: Management Console Icons 330 On the Applications tab, virtual hosts and single-active monitors use the following icons to indicate the primary and backups. Multi-active monitors use the same icons but do not include the primary or backup indication. Management Console Alerts The Management Console uses the following icons to indicate the severity of the messages that appear in the Alert window. Copyright © 1999-2007 PolyServe, Inc. All rights reserved.
B Error and Log File Messages When certain errors occur, Matrix Server writes messages to the Management Console. Other error messages are written to the server’s log file (matrix.log). Management Console Alert Messages NN.NN.NN.NN has lost a significant portion of its SAN access, possibly due to a SAN hardware failure The specified server is unable to write to any of the membership partitions. Ensure that the server can access the membership partitions and also has write access to them.
Appendix B: Error and Log File Messages 332 NN.NN.NN.NN should be rebooted ASAP as it stopped matrix network communication DATE HH:MM:SS and was excluded from the SAN to protect filesystem integrity The server was excluded from the matrix because it could no longer communicate over the network. The server should be rebooted at the first opportunity. Also check the network and make sure that the server is not experiencing a resource shortage. NN.NN.NN.
Appendix B: Error and Log File Messages 333 Error getting cluster status from server: The describes the error. The connection to the server on port 9050 was successful but the first response from the server experienced an I/O error. Error logging in: I/O error connecting to server An I/O error occurred during the authentication of the Management Console to the ClusterPulse daemon. View recent entries in the matrix.
Appendix B: Error and Log File Messages 334 Majority of membership partitions are unwritable, possibly due to a SAN or storage hardware failure. As a result, disk imports and deports cannot be done, and some servers may be unable to mount filesystems. In addition, Matrix Server’s ability to recover from a future server failure is compromised. Such a failure would leave Matrix Server no option but to pause some or all filesystems throughout the matrix to preserve filesystem integrity.
Appendix B: Error and Log File Messages 335 Matrix unable to take control of SAN, because the servers are unable to perform fencing operations, possibly due to a networking or fencing hardware failure or misconfiguration. As a result, some or all filesystem operations may be paused throughout the matrix. In addition, filesystem mounts and unmounts and disk imports and deports can not be performed.
Appendix B: Error and Log File Messages 336 Membership partition XXXX is unwritable, possibly due to a SAN or storage hardware failure. If other membership partitions become inaccessible, Matrix Server’s ability to recover from a server failure will be compromised. None of the servers in the matrix can write to the specified membership partition. Ensure that the servers can access the membership partition and that they have write access for it.
Appendix B: Error and Log File Messages 337 Reboot ASAP as it stopped matrix network communication at date/time but attempts to exclude it from the SAN were unsuccessful! Rebooting it will allow normal matrix operation to continue. Alternatively, if the server cannot be rebooted, but can be confirmed to have no access to the SAN, run ‘mx server markdown ' to restore normal matrix operation. Matrix Server cannot fence a server that is no longer communicating with the matrix.
Appendix B: Error and Log File Messages 338 Singleton matrix unable to take control of SAN. Possibly this server has not been added to the matrix or has been deleted from the matrix, or possibly a network failure has partitioned this server from the rest of the matrix. As a result, some or all filesystem operations may be paused throughout the matrix. In addition, filesystem mounts and unmounts and disk imports and deports cannot be performed.
Appendix B: Error and Log File Messages 339 ClusterPulse Messages Bad command -- Could not find device monitor instance for XXX on server YYY The monitor_agent daemon is reporting status on a device monitor with device name XXX on server YYY but the ClusterPulse daemon does not recognize this device. Probably the Management Console has removed the device monitor and monitor_agent has already sent the status to ClusterPulse. Therefore, no corrective action is required.
Appendix B: Error and Log File Messages 340 Internal system error -- Internal error at server X.X.X.X: select returned with an unknown read socket N Internal system error -- Internal error at server X.X.X.X: select returned with an unknown write socket N Internal system error -- Internal select error at server X.X.X.X: [select ?] with errno of N The ClusterPulse daemon received a system error. Report this error to PolyServe Technical Support at your earliest opportunity.
Appendix B: Error and Log File Messages 341 Monitor error -- monitor_agent reported N:: The monitor_agent daemon experienced an error and is copying the error string to the matrix.log file. Inspect the error string for details about resolving the error. Network error -- set_readable called with unknown socket N Network error -- set_writeable called with unknown socket N If you receive this message, notify PolyServe Technical Support at your earliest convenience.
Appendix B: Error and Log File Messages 342 Script error -- Write to monitor failed: . This probably means the agent has crashed for agent monitor_agent. Shutting down agent. The ClusterPulse daemon experienced an error while trying to write to the monitor_agent daemon. It will attempt to recover from this failure. Script error -- Matrix Server cannot invoke a non executable agent monitor_agent Verify that the execute permission on monitor_agent is set correctly.
Appendix B: Error and Log File Messages 343 PSFS Filesystem Messages If you receive a panic message from the PSFS filesystem, report it to PolyServe Technical Support at your earliest convenience. Then reboot the affected server to recover from the error condition. Distributed Lock Manager Messages The Distributed Lock Manager (DLM) generates error messages if it detects that a filesystem operation will block indefinitely because of an internal error.
Appendix B: Error and Log File Messages 344 SCL Daemon Messages If messages such as the following appear in the matrix log, the matrix may not be able to start up properly. Contact PolyServe Technical Support for assistance. CRITICAL ERROR: Unable to fence host xxxxxxxx xxxxxxx: xxxx (switch=xxxx) Network Interface Messages The PanPulse daemon generates messages about the state of the network interfaces configured in the matrix.
Appendix B: Error and Log File Messages 345 PanPulse then fails over the active interface to another network interface and reports the following. (You will also see this message for other situations in which PanPulse chooses a new active interface.) Selected new active interface address
Appendix B: Error and Log File Messages 346 Port 8940 Only one instance of PanPulse can be running on port 8940 on a server. If another application is using that port or another instance of PanPulse is started, the following error will be reported. Unable to bind on port 8940. Please make sure that this is the only copy of panpulse running on this server. mxinit Messages mxinit prints a series of messages when it cannot complete an operation.
Appendix B: Error and Log File Messages 347 To resolve the problem, check the switch for faulting or failed components such as GBICs and/or faulting slots. Loss of Network Access or Unresponsive Switch Errors such as the following can occur when a Matrix Server node reacts to the loss of network access or to an unresponsive FibreChannel switch. FenceAgent 172.23.186.174: no response to queries from 172.23.186.30 172.23.186.
Appendix B: Error and Log File Messages 348 Default VSAN Is Disabled on Cisco MDS FC Switch If you are using a Cisco MDS FibreChannel switch with firmware version 2.1 or later, you may see the alerts such as the following on the PolyServe Management Console: FenceAgent : no response to queries from You might also see additional logging in psSAN.log: Transient Error: fabric.
Index A administrative network defined 5 failover 60 network topology 58 requirements for 57 select 58 alerts Alerts pane on Management Console 34 display error on Management Console 34 display on command line 320 icons shown on Management Console 330 applications create 193 filter 197 manage 199 name of 193 status 195 Applications tab drag and drop operations 199 filter applications 197 format 197 icons 195 manage application monitors 204 manage applications 199 menu operations 202 modify display 195 rehos
Index 350 command-line administration 28 configuration active-active 157 active-passive 158 back up 40 device monitor 240 examples 157 files, backup or restore 40 limits 21 network interface 57 notifier 256 PSFS filesystems 93 SAN disks 64 server 49 service monitor 225 virtual host 209 configurations, supported 12 Connect window authentication parameters 23 bookmarks 24 Clear History button 23 Connect button 23 context dependent symbolic links 126 custom monitors device 238 service 224 custom scripts 270
Index 351 E environment variables scripts 274 error messages ClusterPulse 339 DLM 343 Fence Agent 346 management console 331 mxinit 346 network interface 344 PSFS filesystem 343 SANPulse 343 SCL daemon 344 errors, status summary 153 events alert messages 34 defined 11 device monitor clear from Console 254 event severity behavior 247 view 254 service monitor clear from Console 235 event severity behavior 232 view 234 Export Group add 137 advanced options script ordering 152 advanced options for monitor eve
Index 352 crash recovery 96 create 98 create with mkpsfs command 102 create with mx command 102 destroy 125 extend 116 features 93 features, configured 117 journal 94 mount 105 mount information 119 persistent mounts 113 properties 115 quotas, defined 96 quotas, enable 100 recover after eviction 125 recreate 101 relabel 115 resize 124 restrictions 96 suspend 123 unmount 112 view mount errors on server 120 view status 119 Filesystems tab, on Management Console 32 firewall 42 FTP service monitor 223 G gett
Index 353 location of 24 permissions 24 update to use new features 27 membership partitions active 290 add 295 defined 65 display 291 inactivate 296 inactive 290 increase timeout 299 remove 294 repair 289 replace 294 resilver 293 mkpsfs command 102 monitor, high-availability clear errors 155 delete 155 disable 155 enable 155 scripts 150 multipath I/O defined 12 example 15 manage with mxmpio 74 QLogic driver, enable failover 78 third-party 79 mx commands 28 mx server markdown command 301 mxcollect utility
Index 354 disable 257 enable 258 Notifiers tab, on Management Console 33 O objects Export Groups 131 export records 132 Virtual NFS Services 132 OLI, storage 302 P PanPulse daemon administrative network 58 defined 7 error messages 344 partitions on SAN disks repartition 69 passwords 35 Performance Dashboard counter details 262 counters, defined 267 for all servers 260 for one server 264 start from command line 266 persistent filesystem mounts 113 pmxs script 38 PolyServe Technical Support FTP account 31
Index 355 scripts, device monitor configure 245 event severity 247 ordering 247 scripts, Export Group 149 scripts, notifier 259 scripts, service monitor configure 230 event severity 232 SDMP daemon 7 server backup 10 cannot fence 301 change FC switch port 50 device monitor, associate with 250 memory resources 11 primary 10 SAN access 283 server configuration add or update 49 delete 51 disable 51 DNS load balancing 54 enable 51 server log 315 server registry clear 297 defined 94 service monitor activity st
Index 356 monitors 322 software problems 321 U users, authentication 23 V virtual host active and inactive interfaces 206 activeness policy 215 applications, configure to recognize 212 change virtual IP address 213 defined 206 device monitors, dependency on 249 enable or disable network interface 63 failover 215 guidelines 208 policy for failback 210 primary and backup servers 206 rehost via Applications tab 213 virtual host configuration add or update 209 change IP address 213 delete 214 Virtual Hosts