PolyServe® Matrix Server Administration Guide PolyServe® Matrix Server 3.
Copyright 1999-2006 PolyServe, Inc. Use, reproduction and distribution of this document and the software it describes are subject to the terms of the software license agreement distributed with the product (“License Agreement”). Any use, reproduction, or distribution of this document or the described software not explicitly permitted pursuant to the License Agreement is strictly prohibited unless prior written permission from PolyServe has been received.
Contents 1 Introduction Product Features. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 The Structure of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Software Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Shared SAN Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Contents 4 Authentication Parameters and Bookmarks. . . . . . . . . . . . . . . . . Manage Bookmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Manage a Matrix from the Command Line . . . . . . . . . . . . . . . . . The PolyServe Management Console . . . . . . . . . . . . . . . . . . . . . . . . . Virtual Hosts Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Applications Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Contents 5 Making Network Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Add or Modify a Network Interface . . . . . . . . . . . . . . . . . . . . . . . . . . Remove a Network Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Allow or Discourage Administrative Traffic . . . . . . . . . . . . . . . . . . . Enable or Disable a Network Interface for Virtual Hosting. . . . . . . 48 48 49 50 50 5 Configure the SAN Overview . . . . . . . . . . . . . . . . . . . .
Contents 6 Disk Quotas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Crash Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Create a Filesystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Create a Filesystem from the Management Console. . . . . . . . . . Recreate a Filesystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Contents 7 10 Matrix Operations on the Applications Tab Applications Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Create Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Applications Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Application States. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Filter the Applications Display . . . . . . . . . . . . . . . . . . . . . . . . . .
Contents 8 Clear Service Monitor Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 13 Configure Device Monitors Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Types of Device Monitors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Device Monitors and Failover . . . . . . . . . . . . . . . . . . . . . . . . . . . Device Monitor Activeness Policy . . . . . . . . . . . . . . . . . . . . . . . .
Contents 9 Validate Load-Balancing When a Server Is Down. . . . . . . . . . . 178 Test LAN Failover of Administrative Matrix Traffic. . . . . . . . . . . . 179 16 Advanced Topics The Effect of Monitors on Failover . . . . . . . . . . . . . . . . . . . . . . . . . . Service Monitors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Custom Device Monitors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Integrate Custom Applications . . . . . . . . . . . . . . . .
Contents 10 Notify PolyServe Technical Support . . . . . . . . . . . . . . . . . . . . . . Check the Server Configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Disable a Server for Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . Troubleshoot Matrix Problems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Matrix Server Fails to Start . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Server Status Is “Down” . . . . . . . . . . . . . . . . . . . .
1 Introduction PolyServe Matrix Server provides a matrix structure for managing a group of network servers and a Storage Area Network (SAN) as a single entity. Product Features Matrix Server includes the following features: • Fully distributed data-sharing environment. The PSFS filesystem enables all servers in the matrix to directly access shared data stored on a SAN.
Chapter 1: Introduction 2 line interface enable you to configure and manage the entire matrix either remotely or from any server in the matrix. • Failover support for network applications. Matrix Server uses virtual hosts to provide highly available client access to mission-critical data for Web, e-mail, file transfer, and other TCP/IP-based applications.
Chapter 1: Introduction 3 Servers. Each server must be running Matrix Server. Public LANs. A matrix can include up to four network interfaces per server. Each network interface can be configured to support multiple virtual hosts, which provide failover protection for Web, e-mail, file transfer, and other TCP/IP-based applications. Administrative Network. Matrix Server components communicate with each other over a common LAN. The network used for this traffic is called the administrative network.
Chapter 1: Introduction 4 Processes ClusterPulse Distributed Lock Manager Management Console Kernel components SDMP SANPulse Administrative Network SCL PSFS module psd and psv drivers PanPulse mxlogd mxlog grpcommd Other processes SANPulse process. Provides the matrix infrastructure for management of the SAN. SANPulse coordinates filesystem mounts, unmounts, and crash recovery operations. PSFS filesystem module. The shared filesystem. SCL process. Manages shared storage devices.
Chapter 1: Introduction 5 psd driver. Provides matrix-wide consistent device names among all servers. psv driver. Used by the Matrix Server Volume Manager, which creates, extends, recreates, or destroys dynamic volumes. PanPulse process. Selects and monitors the network to be used for the administrative network, verifies that all hosts in the matrix can communicate with each other, and detects any communication problems. mxlogd process. Manages global error and event messages.
Chapter 1: Introduction 6 PSFS Filesystems A PSFS filesystem can be created either on a dynamic volume created with the Volume Manager or on a partition of a disk that has been imported into the matrix. The PSFS filesystem provides the following features: • Concurrent access by multiple servers. After a filesystem has been created on a shared disk, the filesystem is available to all matrix servers having physical access to the device via the SAN.
Chapter 1: Introduction 7 • Volume database. This database stores information about dynamic volumes and is located on the membership partitions. Virtual Hosts and Failover Protection Matrix Server uses virtual hosts to provide failover protection for servers and network applications. A virtual host is a hostname/IP address configured on one or more servers. The network interfaces selected on those servers to participate in the virtual host must be on the same subnet.
Chapter 1: Introduction 8 Matrix Server includes several built-in service monitors for monitoring well-known network services. You can also configure custom monitors for other services. A device monitor is similar to a service monitor; however, it is assigned to one or more servers. Matrix Server provides several built-in device monitors. The DISK device monitor can be used to watch local disk drives or to check access to a partition on a SAN disk. The GATEWAY device monitor watches gateway devices.
Chapter 1: Introduction 9 server, with a ratio of 8), paging can increase on the smallest server to the extent that overall matrix performance is significantly reduced. Supported Configurations Matrix Server supports multiple FibreChannel switches configured as a single fabric and multiported SAN disks. iSCSI arrays are also supported. The following diagrams show some sample matrix configurations using these components. Single FC Port, Single FC Switch, Single Fabric This is the simplest configuration.
Chapter 1: Introduction 10 Single FC Port, Dual FC Switch, Single Fabric In this example, the fabric includes two FibreChannel switches. Servers 1–3 are connected to the first FC switch; servers 4–6 are connected to the second switch. The FC switches are connected to two RAID arrays, which contain multiported disks. If a switch fails, the servers connected to the other switch will survive and access to storage will be maintained.
Chapter 1: Introduction 11 iSCSI Configuration This example shows an iSCSI configuration. The Microsoft iSCSI initiator is installed on each server. Ideally, a separate network should be used for connections to the iSCSI storage arrays. Server 1 Server 2 Server 3 Server 4 Server 5 Server 6 Network Switch iSCSI Array iSCSI Array PolyServe Technical Support PolyServe Technical Support provides both technical assistance and product information and downloads.
Chapter 1: Introduction 12 When reporting an incident, please have the following information available: • Your operating system and the affected product. • A description of the problem. • A timeline of the events that occurred. You may be asked to run the Matrix Server mxcollect utility, which collects system logs and other information, and then forward the logs to PolyServe. For information about running mxcollect, see “Collect Log Files with mxcollect” on page 211.
2 Matrix Administration PolyServe Matrix Server can be administered either with the PolyServe Management Console or from the command line. Administrative Considerations and Restrictions You should be aware of the following when managing Matrix Server. Network Hostname Resolution Normal operation of the matrix depends on a reliable network hostname resolution service. If the hostname lookup facility becomes unreliable, this can cause reliability problems for the running matrix.
Chapter 2: Matrix Administration 14 • Certain Microsoft Knowledge Base articles caution that in the case of Exchange SMTP, and possibly other applications, the use of the hosts file can interfere with mail flow (see Microsoft Knowledge Base article 296215). • Although using the hosts file provides immunity to DNS problems, it must be manually updated on each node. For example, if an IP address changes, all of the hosts files must be updated.
Chapter 2: Matrix Administration 15 connection for the server must not be established while MatrixServer is running on the server. • If servers from multiple matrices can access the SAN via a shared FC fabric, avoid importing the same disk into more than one matrix. Filesystem corruption can occur when different matrices attempt to share the same filesystem. • Changing the hardware configuration, such as adding or replacing switch modules, is not supported while Matrix Server is running.
Chapter 2: Matrix Administration 16 • Matrix Server nodes should not be used as domain controllers because the two services will compete for resources, resulting in decreased performance. • The DNS servers used by Active Directory and Matrix Server should not reside on Matrix Server nodes. Placing the DNS servers on Matrix Server nodes creates a race condition that prevents Matrix Server from starting.
Chapter 2: Matrix Administration 17 Tested Configuration Limits PolyServe has tested Matrix Server configurations up to the following limits: • 16 servers per matrix for FC fabric configurations; six servers per matrix for iSCSI configurations • 256 imported LUNs per matrix for FC fabric configurations; for iSCSI configurations, the maximum number of connections for the iSCSI initiator • 128 filesystems per matrix • 2048 filesystem mounts per matrix • 64 virtual hosts per matrix • 128 service and/or devic
Chapter 2: Matrix Administration 18 Matrix Management Applications Matrix Server provides two management applications: mxconsole, the PolyServe Management Console, and mx, the corresponding commandline interface. These applications can be run from either a matrix server or a local machine outside the matrix. On a local machine, the management applications download their rendering and application logic from the servers in the matrix.
Chapter 2: Matrix Administration 19 Connect to: Type a matrix or server name or select a name from the dropdown list. When you connect to a server or matrix, it is added to the drop-down list. Click the Clear History button to delete the list. (Saved bookmarks will remain.) Connect: This button provides two options: Connect or Configure; click the down arrow to see the options.
Chapter 2: Matrix Administration 20 Password: Type the user’s password. If you do not want to be prompted for the password again, click the “Remember this password” checkbox. (For the password to be saved, you will also need to create a bookmark.) Add to bookmarks: Click this checkbox to create a bookmark for this connection. When you click OK on the Authentication Parameters dialog, you can configure the bookmark on the Add Bookmark dialog.
Chapter 2: Matrix Administration 21 The bookmark options are: • Add. This option opens the Add Bookmark dialog, allowing you to configure a new bookmark. • Delete. If a matrix is selected, this option removes the bookmark for that matrix. If a server is selected, the option removes just that server from the bookmark. • Rename. If a matrix is selected, this option allows you to rename that matrix. If a server is selected, you can replace that server with a different server in the matrix.
Chapter 2: Matrix Administration 22 • Move Up/Move Down. Use these buttons to reorder the list in the Bookmarks window. This option is especially useful for changing the connection order for the servers in a matrix. NOTE: If you are using a wildcard to match the servers in the matrix, the wildcard entry should appear after any server entries. You can use the move buttons to reorder the entries as necessary. (For information about wildcards, see the description of the .
Chapter 2: Matrix Administration 23 • Remove the server entries from the .matrixrc file. Then specify one of the servers on the Matrix Server Connect window. When the Authentication Parameters dialog appears, check “Add to Bookmarks.” Including passwords in the .matrixrc file is now optional. You can remove the passwords from your file if desired, or select the bookmark entry on the Connect window and click Reset. For more information about the .
Chapter 2: Matrix Administration 24 There are several menus at the top of the window: • Matrix. Add matrix entities, disconnect from the matrix, access the Configure Matrix window, exit the Management Console. • Edit. Modify the entity selected on the console. For example, you can enable or disable a monitor. • View. Show information about the selected entity. • Storage. Access options for disks and dynamic volumes. • Window. View Matrix Server windows. • Help.
Chapter 2: Matrix Administration 25 The toolbar can be used for matrix connections, to add matrix entities such as virtual hosts or filesystems, to import or deport disks, to display the Storage Summary, to collapse or expand the entity lists, and to display the online help. “Management Console Icons” on page 222 describes the icons used to represent matrix entries and their status. Virtual Hosts Tab The Virtual Hosts tab shows all virtual hosts in the matrix.
Chapter 2: Matrix Administration 26 Applications Tab This view shows the application monitors configured in the matrix and provides the ability to manage and monitor them from a single screen. The tab uses a table format, with a column for each server in the matrix. The application monitors appear in the rows of the table. You can reorder the information on this tab or limit the information that is displayed.
Chapter 2: Matrix Administration Filesystems Tab The Filesystems tab shows all PSFS filesystems in the matrix. Copyright © 1999-2006 PolyServe, Inc. All rights reserved.
Chapter 2: Matrix Administration Notifiers Tab The Notifiers tab shows all notifiers configured in the matrix. Copyright © 1999-2006 PolyServe, Inc. All rights reserved.
Chapter 2: Matrix Administration 29 Matrix Alerts The Alerts section at the bottom of the Management Console window lists errors that have occurred in matrix operations. Double click an alert to view the error in the matrix tree structure. If you receive an alert telling you to reboot a server, the message will remain in the Alert section until either Matrix Server is restarted on the rebooted server or the server is removed from the matrix. Copyright © 1999-2006 PolyServe, Inc. All rights reserved.
Chapter 2: Matrix Administration 30 Assign or Change Passwords You must be user admin to make changes to the configuration of the matrix. Other users can view the matrix configuration but cannot make any changes to it. You can use the Matrix Configuration window to change the password for admin. The Matrix Server UserManager or the mxpasswd command can be used to assign or change passwords for admin and other users.
Chapter 2: Matrix Administration 31 Start Matrix Server By default, Matrix Server starts automatically when the system is booted. This feature is controlled by the Startup dialog. If you do not want Matrix Server to start when the system is booted, use the Startup dialog to change the service from Automatic to Manual. To start Matrix Server on a particular server, use one of these methods: • Issue the command net start matrixserver from a CMD prompt.
Chapter 2: Matrix Administration 32 Back Up and Restore the Matrix Configuration It is important to back up the matrix configuration whenever changes are made. You can then easily restore the matrix if necessary. If you are using Matrix Server disk quotas, you should also back up the Matrix Server quota information after backing up your PSFS filesystems.
3 Configure Servers Before adding a server to a matrix, verify the following: • The server is connected to the SAN if it will be accessing PSFS filesystems. • The server is configured as a fully networked host supporting the services to be monitored. For example, if you want Matrix Server to provide failover protection for your Web service, the appropriate Web server software must be installed and configured on the servers. • If the hosts file has been modified, it should be consistent with the DNS.
Chapter 3: Configure Servers 34 2. Install Matrix Server. Insert the Matrix Server CD into the CD drive or go to the directory where you downloaded the product. Then double-click the file MxS_..msi and run the Installation Wizard. 3. Import the existing matrix configuration. You can import the configuration from any server in the matrix. Complete these steps: a. On the new server, select Start > Programs > PolyServe Matrix Server > Management Console.
Chapter 3: Configure Servers 35 • Use the PolyServe Management Console to change drive letter assignments. Note that the change will take place on all nodes and may affect applications. • Use Windows Disk Manager to change the assignments. If you are using Windows 2000 Terminal Services to make the change, you will need to log out and then log back in before you can use the reassigned drive letters.
Chapter 3: Configure Servers 36 When the original server is restored to normal operation (for example, the power is restored and the server is rebooted), ClusterPulse uses the Server Severity to determine whether it is possible to fail back virtual hosts to that server automatically. ClusterPulse also considers each virtual host’s failback policy, which specifies whether it should fail back or remain on the backup server. (See “Virtual Hosts and Failover” on page 128 for more information.
Chapter 3: Configure Servers 37 Other Server Configuration Procedures Delete a Server Select the server to be deleted from the Servers window on the PolyServe Management Console, right-click, and select Delete. To delete servers from the command line, use this command: mx server delete ... Disable a Server Select the server to be disabled from the Servers window on the PolyServe Management Console, right-click, and select Disable.
Chapter 3: Configure Servers 38 3. Start Matrix Server on server S2a. The server joins the matrix, which now consists of servers S1, S2, S3, and S2a. Server 2 is down and S1, S2a, and S3 are up. 4. Delete server S2 from the matrix. This step will remove references to the server. 5. Update virtual hosts and any other matrix entities that used server S2 to now include S2a.
Chapter 3: Configure Servers 39 3. Select the server in the Address column and then click Export. The Last Operation Progress column will display status messages as the configuration is exported to the server. 4. Start Matrix Server on the server. The server will still be selected in the Address column. Click Start Service to start Matrix Server. A status message will appear in the Last Operation Progress column. When Matrix Server is running on the server, you can close the window.
Chapter 3: Configure Servers 40 NOTE: If there is a .matrixrc file on the system running mxconsole, you will see a Disconnect dialog instead of the Connection Parameters window. Select “Logon to another matrix server” and then click the Configure button. 2. Select the Matrix Wide Configuration tab and then stop the service on all nodes. 3. Return to the General Settings tab and select Change License File. 4. Type the path to the new license file or browse to it. 5. Click Apply. 6.
Chapter 3: Configure Servers 41 Migrate Existing Servers to Matrix Server In Matrix Server, the names of your servers should be different from the names of the virtual hosts they support. A virtual host can then respond regardless of the state of any one of the servers. In some cases, the name of an existing server may have been published as a network host before Matrix Server was configured.
Chapter 3: Configure Servers 42 new virtual hostname are automatically redirected by Matrix Server to a backup server. Configure Servers for DNS Load Balancing Matrix Server can provide failover protection for servers configured to provide domain name service (DNS) load balancing. DNS load balancing allows you to set up servers so that requests are sent alternately to each of the servers in a matrix.
Chapter 3: Configure Servers 43 acmd1 acmd2 Primary: virtual_acmd1 Primary: virtual_acmd2 Backup: virtual_acmd2 Backup: virtual_amcd1 Virtual host traffic The addresses on the name server are virtual_acmd1 and virtual_acmd2. Two virtual hosts have also been created with those names. The first virtual host uses acmd1 as the primary server and acmd2 as the backup. The second virtual host uses acmd2 as the primary and acmd1 as the backup.
Chapter 3: Configure Servers 44 IP address: The IP addresses for the virtual hosts that you will use for each server in the matrix. These are the IP addresses that the DNS will use to send alternate requests. With this setup, the domain name server sends messages in a roundrobin fashion to the two virtual hosts indicated by the IP addresses, causing them to share the request load.
4 Configure Network Interfaces When you add a server to the matrix, Matrix Server determines whether each network interface on that server meets the following conditions: • The network interface is up and running. • The network interface is multicast-capable. • 802.3x Ethernet flow control is not used. • Each network interface card (NIC) is on a separate network. Network interfaces meeting these conditions are automatically configured into the matrix.
Chapter 4: Configure Network Interfaces 46 performance reasons, we recommend that these networks be isolated from the networks used by external clients to access the matrix. When Matrix Server is started, the PanPulse process selects the administrative network from the available networks. When a new server joins the matrix, the PanPulse process on that server tries to use the established administrative network.
Chapter 4: Configure Network Interfaces 47 Each network interface is labeled “Hosting Enabled” or “Hosting Disabled,” which indicates whether it can be used for virtual hosts. The Management Console uses the following icons to represent the status of each network interface. The network interface allows administrative traffic. The network interface discourages administrative traffic.
Chapter 4: Configure Network Interfaces 48 If Matrix Server must use a network that was configured to discourage administrative traffic, it will fail over to a network that allows the traffic as soon as that network becomes available to all servers in the matrix. If multiple interface failures occur on a server and there is not another network available for the administrative network, the server may drop out of the matrix. The remaining servers will continue to use the existing administrative network.
Chapter 4: Configure Network Interfaces 49 Server: The name or IP address of the server that will include the new network interface. IP: Type the IP address for the network interface. Net Mask: Type the net mask for the network interface. Allow Administrative Traffic: Specify whether the network interface can host administrative traffic. The default is to allow the traffic.
Chapter 4: Configure Network Interfaces 50 The mx command to remove a network interface is as follows: mx netif delete Allow or Discourage Administrative Traffic By default, all network interfaces allow administrative traffic. However, you can specify which networks you prefer to use for this traffic. To allow or discourage administrative traffic on a network interface, select that network interface on the Servers window, right-click, and then select either “Allow Admin.
5 Configure the SAN SAN configuration includes the following: • Import SAN disks into the matrix. • Deport SAN disks from the matrix. • Display information about SAN disks. Overview SAN Configuration Requirements Be sure that your SAN configuration meets the requirements specified in the PolyServe Matrix Server Installation Guide. Storage Control Layer Module The Storage Control Layer (SCL) module manages shared SAN devices.
Chapter 5: Configure the SAN 52 access the device. Although the identifiers (such as psd2 or psd2p6) appear on certain PolyServe Management Console windows, they are generally only needed for internal use by Matrix Server. Device Identifiers and GPT Disks When the SCL assigns device identifiers to the partitions on GPT disks, it skips the first partition because that partition cannot be used by Matrix Server.
Chapter 5: Configure the SAN 53 The higher-numbered partitions will continue to work correctly; however, you should be aware of the following: • A new volume cannot include subdevices having partition numbers above 31. Existing volumes cannot be extended to include the highernumbered partitions. • You will not be able to take a hardware snapshot of partitions with numbers above 31. • A single Alert will be issued if any disks or volumes in the matrix contain unsupported partitions.
Chapter 5: Configure the SAN 54 • Disks containing an active membership partition can be imported; however, the partition containing the active membership partition cannot be used for a filesystem. Before importing the disk, you can run mprepair to inactivate the membership partition (see “Manage Membership Partitions with mprepair” on page 193). You will then be able to use the partition when you import the disk into the matrix.
Chapter 5: Configure the SAN 55 To determine the uuid for a disk, run the following command, which prints the uuid, the size, and a vendor string for each unimported SAN disk. mx disk status You can also use the Disk Info window to import a disk. Deport SAN Disks Deporting a disk removes it from matrix control. You cannot deport a disk that contains a membership partition. To deport a disk from the PolyServe Management Console, select Storage > Disk > Deport or click the Deport icon on the toolbar.
Chapter 5: Configure the SAN 56 Local Disk Information The Disk Info window displays disk information from the viewpoint of the local server. It can be used to match the disk names appearing in the Microsoft Disk Management utility (the Local Name) with the disk names that Matrix Server uses (the PSD Name). You can also use this window to import or deport SAN disks.
Chapter 5: Configure the SAN 57 NOTE: Because the first partition on GPT disks cannot be used by Matrix Server, that partition is skipped when Matrix Server assigns device identifiers to the partitions. The first identifier, psdXp1, is assigned to partition 2, the second identifier, psdXp2, is assigned to partition 3, and so on.
Chapter 5: Configure the SAN 58 The window shows the following information for each PSFS filesystem: • The label assigned to the filesystem. • The mount point or drive letter assigned to the filesystem. Click in the cell to see the mount point/drive letter for each server on which the filesystem is configured. • The volume used for the filesystem. Click in the cell to see the properties for the filesystem. • The number of CIFS shares.
Chapter 5: Configure the SAN 59 The -a option also lists the partitions on each disk. When combined with -u, it displays partition information for unimported disks. sandiskinfo -a Disk: \\.
Chapter 5: Configure the SAN 60 The -v option lists available volumes on imported disks. These volumes are not currently in use for a PSFS filesystem or a membership partition. sandiskinfo -v Volume: \\.\Global\psd1p2 Disk=20:00:00:04:cf:13:38:18::0 Volume: \\.\Global\psd1p3 Disk=20:00:00:04:cf:13:38:18::0 Volume: \\.\Global\psd1p4 Disk=20:00:00:04:cf:13:38:18::0 Volume: \\.\Global\psd1p5 Disk=20:00:00:04:cf:13:38:18::0 Volume: \\.\Global\psd1p7 Disk=20:00:00:04:cf:13:38:18::0 Volume: \\.
6 Configure Dynamic Volumes Matrix Server includes a Volume Manager that you can use to create, extend, recreate, or delete dynamic volumes. Dynamic volumes allow large filesystems to span multiple disks, LUNs, or storage arrays. Overview Basic and Dynamic Volumes Volumes are used to store PSFS filesystems. There are two types of volumes: dynamic and basic. Dynamic volumes are created by the Volume Manager. They can include one or more disk partitions that have been imported into the matrix.
Chapter 6: Configure Dynamic Volumes 62 • Striping. When a dynamic volume is created with striping enabled, a specific amount of data (called the stripe size) is written to each subdevice in turn. For example, a dynamic volume could include three subdevices and a stripe size of 64 KB. That amount of data will be written to the first subdevice, then to the second subdevice, and then to the third subdevice. This method fills the subdevices at the same rate and may provide better performance.
Chapter 6: Configure Dynamic Volumes 63 Guidelines for Creating Dynamic Volumes When creating striped dynamic volumes, follow these guidelines: • The subdevices used for a striped dynamic volume should be the same size. The Volume Manager uses the same amount of space on each subdevice in the stripeset. When a striped dynamic volume is created, the Volume Manager determines the size of the smallest specified subdevice and then uses only that amount of space on each subdevice.
Chapter 6: Configure Dynamic Volumes 64 Filesystem: If you want Matrix Server to create a filesystem that will be placed on the dynamic volume, enter a label to identify the filesystem. If you do not want a filesystem to be created, remove the checkmark from “Create filesystem after volume creation.” If you are creating a filesystem, you can also set various filesystem options.
Chapter 6: Configure Dynamic Volumes 65 Striping: If you want this volume to use striping, check the “Enable striping” checkbox and then select the size of the stripe. Although the default stripe size of 64KB should be adequate for most applications and hardware configurations, you may need to use a different stripe size for your particular circumstances. Click either OK or APPLY to create the volume. (APPLY keeps the window open, allowing you to create additional dynamic volumes.
Chapter 6: Configure Dynamic Volumes 66 The Stripe State reported in the “Dynamic Volume Properties” section will be one of the following: • Unstriped. The volume is concatenated and striping is not in effect. • Optimal. The volume has only one stripeset that includes all subdevices. Each subdevice is written to in turn. • Suboptimal. The volume has been extended and includes more than one stripeset. The subdevices in the first stripeset will be completely filled before writes to the next stripeset begin.
Chapter 6: Configure Dynamic Volumes 67 View Stripeset Information To see the contents of a stripeset, run mpdump.exe with no options from the Command Prompt. The command is in the directory Program Files\Polyserve\MatrixServer\bin on the drive where you installed Matrix Server. Following is some sample output. Current Product MP Version: 2 Membership Partition Version: 2 Membership Partitions: 10:00:00:50:13:b3:41:66::63/2 (ONLINE) . . .
Chapter 6: Configure Dynamic Volumes 68 To extend a dynamic volume on the Management Console, select Storage > Dynamic Volume > Extend Volume and then choose the volume that you want to extend. If a filesystem is on the volume, the Extend Dynamic Volume window shows information for both the dynamic volume and the filesystem. Dynamic Volume Properties: The current properties of this dynamic volume. Filesystem Properties: The properties for the filesystem located on this dynamic volume.
Chapter 6: Configure Dynamic Volumes 69 When you click OK, the dynamic volume will be extended. To extend a dynamic volume from the command line, use this command: mx dynvolume extend Delete a Dynamic Volume When a dynamic volume is deleted, the filesystem on that volume, and any persistent mounts for the filesystem, are also deleted.
Chapter 6: Configure Dynamic Volumes 70 Recreate a Dynamic Volume Occasionally you may want to recreate a dynamic volume. For example, you might want to implement striping on a concatenated volume or, if a striped dynamic volume has been extended, you might want to recreate the volume to place all of the subdevices in the same stripe set. When a dynamic volume is recreated, the Volume Manager first destroys the volume and then creates it again using the subdevices and options that you select.
Chapter 6: Configure Dynamic Volumes 71 You can change or reorder the subdevices used for the volume and enable striping if desired. To recreate a volume from the command line, you will first need to use the dynvolume destroy command and then run the dynvolume create command. Convert a Basic Volume to a Dynamic Volume If you have PSFS filesystems that were created directly on an imported disk partition or LUN (a basic volume), you can convert the basic volume to a dynamic volume.
Chapter 6: Configure Dynamic Volumes 72 NOTE: The new dynamic volume is unstriped. It is not possible to add striping to a converted dynamic volume. If you want to use striping, you will need to recreate the volume. To convert a basic volume, select the associated PSFS filesystem on the Filesystems tab of the PolyServe Management Console, right-click, and select Convert to Dynamic Volume. A warning then appears, stating that Matrix Server must unmount the filesystem, which will close any open files.
7 Configure PSFS Filesystems PolyServe Matrix Server provides the PSFS filesystem. This direct-access shared filesystem enables multiple servers to concurrently read and write data stored on shared SAN storage devices. A journaling filesystem, PSFS provides live crash recovery.
Chapter 7: Configure PSFS Filesystems 74 The PSFS filesystem does not migrate processes from one server to another. If you want processes to be spread across servers, you will need to take the appropriate actions. Journaling Filesystem When you initiate certain filesystem operations such as creating, opening, or moving a file or modifying its size, the filesystem writes the metadata, or structural information, for that event to a transaction journal. The filesystem then performs the operation.
Chapter 7: Configure PSFS Filesystems 75 Filesystem Management and Integrity Matrix Server uses the SANPulse process to manage PSFS filesystems. SANPulse performs the following tasks. • Coordinates filesystem mounts, unmounts, and crash recovery operations. • Checks for matrix partitioning, which can occur when matrix network communications are lost but the affected servers can still access the SAN.
Chapter 7: Configure PSFS Filesystems 76 Disk Quotas Disk quotas are enabled or disabled at the filesystem level. When quotas are enabled, the filesystem performs quota accounting to track the disk use of each user having an assigned disk quota. When you create a filesystem and enable quotas, you can also set options including the default hard and soft limits for users on the filesystem. A hard limit specifies the maximum amount of disk space in the filesystem that can be used by files owned by the user.
Chapter 7: Configure PSFS Filesystems 77 Before creating a filesystem, verify the following: • The disk is partitioned appropriately. If you want to change the partition layout, you will need to deport the disk, modify the partition layout with the Windows Disk Management utility, and then reimport the disk. • The volume to be used for the filesystem does not contain needed data. The new filesystem will write over an existing filesystem currently on the volume.
Chapter 7: Configure PSFS Filesystems 78 Available Volumes: This part of the window lists the basic or dynamic volumes that are currently unused. Select one of these volumes for the filesystem. NOTE: The Create a Filesystem window identifies volumes by their Matrix Server names such as psd1p2. To match these names to their local Windows names, open the Disk Info window (select the server on the Servers tab, right-click, and then select View Local Disk Info).
Chapter 7: Configure PSFS Filesystems 79 You can then set default hard and soft quotas for users on that filesystem. If you do not want a default limit, click “Unlimited,” which is the default. To assign a limit, click “Limit” and then specify the appropriate size in either kilobyes, megabytes, gigabytes, or terabytes. The defaults are rounded down to the nearest filesystem block. NOTE: The default user quotas apply to all users who do not have an individual quota assigned.
Chapter 7: Configure PSFS Filesystems 80 The Quota Assignment Policy tab lets you select a default quota for new users who do not have an explicit quota limit. The users inherit the default setting the first time that they own a file on the filesystem. Copyright © 1999-2006 PolyServe, Inc. All rights reserved.
Chapter 7: Configure PSFS Filesystems 81 There are two options: • Static default quota. The default limits are explicitly assigned to the user. Subsequent changes to the default values for the filesystem do not affect the quota limits for the user. This is the default, and matches the NTFS policy. • Dynamic default quota. No explicit default limits are assigned to the user. Instead, the effective limits applied to the user are the default values for the filesystem at the time of each operation.
Chapter 7: Configure PSFS Filesystems 82 Recreate a Filesystem If you want to reformat a filesystem, select the filesystem on the Filesystems window, right-click, and select Recreate Filesystem. A message then appears stating that the Matrix Server needs to unmount the filesystem and any unsaved files will be lost. When you click Yes, the following window appears. Click the Options button to see the available options for the filesystem.
Chapter 7: Configure PSFS Filesystems 83 • -q Suppress messages. • -l
Chapter 7: Configure PSFS Filesystems 84 • static-default or dynamic-default With static-default, quota limits for new users are copied from the default quota values set for the filesystem. With dynamic-default, quota limits for new users are linked from the default quota values set for the filesystem. If the default quota values are changed, the users’ quota limits will also change. static-default is the default.
Chapter 7: Configure PSFS Filesystems 85 • [--quotas] Enable quotas on the filesystem. • [--defaultUserHardLimit ] The default hard limit on the filesystem. unlimited specifies that there is no default. The optional size modifiers specify that the size is in kilobytes (K), megabytes (M), gigabytes (G), or terabytes (T). If a modifier is not specified, the size will be calculated in bytes. (The default is rounded down to the nearest filesystem block.
Chapter 7: Configure PSFS Filesystems 86 Drive Letters and Mount Paths To provide access to a PSFS filesystem, you will need to associate it with a a drive letter or a mount path. Assign Drive Letters or Paths To assign a drive letter or mount path, select the filesystem on the Filesystems tab on the PolyServe Management Console, right-click, and select Assign Path. The assignment is made on all servers in the matrix.
Chapter 7: Configure PSFS Filesystems 87 You can also assign a drive letter or path from the command line: mx fs assignpath --path If a server was out of the matrix while the drive assignment was made or you add a new server to the matrix, you can use the above command to add the drive assignment to that server.
Chapter 7: Configure PSFS Filesystems 88 Remove Drive Letter or Path Assignments If you no longer want to associate a filesystem with a particular drive letter or mount path, you can remove the assignment. Before doing this, be sure that applications are not currently accessing the filesystem via the drive letter or mount path. To remove a drive letter or path assignment, select the filesystem on the Filesystems tab, right-click, and then select Unassign Paths.
Chapter 7: Configure PSFS Filesystems 89 View or Change Filesystem Properties To see information about a specific filesystem, select that filesystem, right-click, and select Properties. Label: This field specifies the label that is assigned to the filesystem. If the filesystem does not have a label, the field will be blank. You can change the label if necessary. Volume Tab On the Properties window, the Volume tab provides information about the storage device and allows you to extend the filesystem.
Chapter 7: Configure PSFS Filesystems 90 window to increase the size of the PSFS filesystem to the maximum size of the volume. When you click on the Extend Filesystem button, you will see a warning such as the following. When you click Yes, Matrix Server will extend the filesystem to use all of the available space. If you want to increase the size of both the volume and the filesystem, use the Extend Filesystem option. For dynamic volumes, see “Extend a Dynamic Volume” on page 67.
Chapter 7: Configure PSFS Filesystems 91 Quotas Tab The Quotas tab allows you to enable or disable quotas on the filesystem, to set the default hard and soft limits, and to configure other quota options. See “Filesystem Options” on page 78 for more information about the quota options. View Filesystem Status from the Command Line You can use the following mx command to see status information. mx fs status [--verbose] [--standard|--snapshot] The command lists the status of each filesystem.
Chapter 7: Configure PSFS Filesystems 92 Extend a Basic Volume and Its Filesystem The PolyServe Management Console provides an option to increase the size of a PSFS filesystem and the basic volume, or partition, on which it is located. NOTE: This option cannot be used to extend filesystems on disks containing a Matrix Server membership partition. Select the filesystem on the Management Console, right-click, and select Extend Volume.
Chapter 7: Configure PSFS Filesystems 93 on the disk that will be deported. Users will not be able to access these filesystems until the resize operation is complete. When you click OK on the Confirm Extend window, Matrix Server deports the disk, resizes the filesystem partition by the specified size, reimports the disk, and then expands the filesystem to fill the additional space in the partition.
Chapter 7: Configure PSFS Filesystems 94 The next example uses a mount path: psfssuspend c:\psfs_mount\ The psfssuspend command prevents modifications to the filesystem and forces any changed blocks associated with the filesystem to disk. The command performs these actions on all servers that have mounted the filesystem and then returns successfully. Any process attempting to modify a suspended filesystem will block until the filesystem is resumed.
Chapter 7: Configure PSFS Filesystems 95 The device can be specified in several ways: • By the drive letter, such as X: • By the mount point (junction), such as C:\san\vol2 • By the psd or psv name, such as psd2p2 or psv3 Perform a Filesystem Check If a filesystem is not unmounted cleanly, the journal will be replayed the next time the filesystem is mounted to restore consistency. You should seldom need to check the filesystem.
Chapter 7: Configure PSFS Filesystems 96 For more information about the check, click the Details button. If psfscheck locates errors that need to be repaired, it will display a message telling you to run the utility from the command line. For more information, see the PolyServe Matrix Server Command Reference. Copyright © 1999-2006 PolyServe, Inc. All rights reserved.
8 Manage Disk Quotas The PSFS filesystem supports disk quotas, which limit the amount of disk space on a filesystem that can be used for individual user’s files. Hard and Soft Filesystem Limits The PSFS filesystem supports both hard and soft filesystem quotas. A hard quota specifies the maximum amount of disk space on a particular filesystem that can be used by files owned by the user.
Chapter 8: Manage Disk Quotas 98 When you create a PSFS filesystem, you can specify whether quotas should be enabled and you can set quota options on the filesystem. (See “Create a Filesystem” on page 76.) Quotas can also be enabled or disabled on an existing filesystem, using either the PolyServe Management Console or Matrix Server commands. The filesystem will be unmounted briefly during the enable/disable operation.
Chapter 8: Manage Disk Quotas 99 Check or uncheck “Enable quotas” as appropriate. If you are enabling quotas, you can set the default hard and soft quotas for users on that filesystem. To do this, click on “Limit” and then specify the appropriate size in either kilobytes, megabytes, gigabytes, or terabytes. The default is rounded down to the nearest filesystem block. (If you do not want a default limit, click “Unlimited.”) The default quotas apply to all users who do not have individual quotas.
Chapter 8: Manage Disk Quotas 100 Manage User Quotas The mx quota command can be used to manage user quotas from the command line. See the PolyServe Matrix Server Command Reference for details about this command. You can also use Microsoft Windows features such as the following to manage user quotas. Refer to the Windows documentation for more information about these features. Quota GUI. The Windows Quota GUI can be accessed from Microsoft Windows Explorer.
Chapter 8: Manage Disk Quotas 101 The Quota Entries window. This window can be accessed via Microsoft Windows Explorer. Display the Properties for the filesystem, select the Quota tab, and then click the Quota Entries button. When using the Quota Entries window, you should be aware of the following: • The “Amount Used” column includes PSFS metadata as well as the space required for the user data in each user’s files. The space used may be different that it would be on another type of filesystem.
Chapter 8: Manage Disk Quotas 102 Back Up and Restore Quotas The psfsdq and psfsrq commands can be used to back up and restore the quota information stored on the PSFS filesystem. These commands should be run in conjunction with standard filesystem backup utilities, as those utilities do not save the quota limits set on the filesystem. NOTE: We recommend that you use the psfsdq and psfsrq commands instead of the Import and Export options on the Quota Entries window.
Chapter 8: Manage Disk Quotas 103 Examples The following command saves the quota information for the filesystem located on device psd1p5. psfsdq -f psd1p5.quotadata psd1p5 The next command restores the data to the filesystem: psfsrq -f psd1p5.quotadata psd1p5 Copyright © 1999-2006 PolyServe, Inc. All rights reserved.
9 Manage Hardware Snapshots Matrix Server provides support for taking hardware snapshots of PSFS filesystems. The subdevices on which the filesystems are located must reside on one or more storage arrays that are supported for snapshots. Snapshot support can be configured on the Management Console “Configure Matrix” window. (See the PolyServe Matrix Server Installation Guide for more information.) This procedure creates a snapshot configuration file on each server.
Chapter 9: Manage Hardware Snapshots 105 Create a Snapshot or Snapclone A snapshot is a space-efficient reflection of a source filesystem at a particular point in time. Snapshots initially consume storage space only to store pointers to the data in the source filesystem, growing in size when source filesystem data is changed.
Chapter 9: Manage Hardware Snapshots 106 The label is used to identify the snapshot on the Management Console. Check “Share as Shadow Copy of Shared Folder” if you want users to be able to use the snapshot as a shadow copy. The remainder of the dialog describes the information that you will need to enter. When you complete the information and click OK, Matrix Server takes these steps: • Quiesces the filesystem to ensure that the snapshot can be mounted cleanly.
Chapter 9: Manage Hardware Snapshots 107 To create a snapshot from the command line, first run the following command to determine the options available for the array type on which the specified volume is located: mx snapshot showcreateopt Then run the following command to create the snapshot: mx snapshot create [--terse] [] The --terse option causes only the name of the snapshot volume to be printed on success.
Chapter 9: Manage Hardware Snapshots 108 To delete a snapshot from the command line, type the following: mx snapshot destroy Snapclone devices, like regular filesystem LUNs, cannot be deleted from the Management Console. To delete snapclones, you must destroy the filesystem and/or volume, deport the LUN(s), and delete the LUN(s) with the array-specific utilities.
Chapter 9: Manage Hardware Snapshots 109 Guide. CIFS clients then use Windows Explorer to access the shadow copies. To enable a CIFS client to use shadow copies, you will need to install the Shadow Copies of Shared Folders client pack on the CIFS client workstation. The client pack is provided in the Windows 2003 distribution and can be installed from the following location, where C is the drive where Windows 2003 is installed: C:\windows\system32\clients\twclient\x86\twcli32.
10 Matrix Operations on the Applications Tab The Applications tab on the Management Console shows all Matrix Server applications, virtual hosts, service monitors, and device monitors configured in the matrix and enables you to manage and monitor them from a single screen. Applications Overview An application provides a way to group associated matrix resources (virtual hosts, service monitors, and device monitors) so that they can be treated as a unit.
Chapter 10: Matrix Operations on the Applications Tab 111 a device monitor, the application will use the same name as the device monitor. The Applications Tab The Management Console lists applications and their associated resources (virtual hosts, service and device monitors) on the Applications tab. The applications and resources appear in the rows of the table. (Double-click on a resource to see its properties.) The servers on which the resources are configured appear in the columns.
Chapter 10: Matrix Operations on the Applications Tab 112 The cells indicate whether a resource is deployed on a particular server, as well as the current status of the resource. If a cell is empty, the resource is not deployed on that server. The icons used on the Applications tab report the status of the servers, applications, and resources. The following icons are used in the server columns to indicate the status of applications and resources.
Chapter 10: Matrix Operations on the Applications Tab 113 The possible states for the application are: Icon Status OK Meaning Clients can access the application. Warning Clients can access the application but not from the primary node. Error Clients cannot access the application. In the following example, the status for most of the applications is OK because clients are accessing the application through the primary server. However, the status of application 99.11.14.
Chapter 10: Matrix Operations on the Applications Tab 114 Filter the Applications Display You can use filters to limit the information appearing on the Application tab. For example, you may want to see only a certain type of monitor, or only monitors that are down or disabled. You can use filters to do this. To add a filter, click the “New Filter” tab and then configure the filter. Name: Specify a name for this filter.
Chapter 10: Matrix Operations on the Applications Tab 115 Click OK to close the filter. The filter then appears as a separate tab and will be available to you when you connect to any cluster. (Filters are stored per user under the registry key.) To modify an existing filter, select that filter, right-click, and select Edit Filter. To remove a filter, select the filter, right-click, and select Delete Filter.
Chapter 10: Matrix Operations on the Applications Tab 116 When you reach a cell that accepts drops, the cursor will change to an arrow. The following drag and drop operations are allowed. Applications These operations are allowed only for applications that include at most only one virtual host. • Assign an application to a server. Drag the application from the Name column to the empty cell for the server.
Chapter 10: Matrix Operations on the Applications Tab 117 • Switch the primary and backup servers (or two backup servers) for a virtual host. Drag the virtual host from one server cell to the cell for the other server. If the virtual host is active, this operation can disconnect existing applications that depend on the virtual host. When the operation is complete, the ordering for failover will be switched. • Remove a virtual host from a server.
Chapter 10: Matrix Operations on the Applications Tab 118 reordered as necessary. If the monitor was multi-active, it will remain active on any other servers on which it is configured. Menu Operations Applications The following operations affect all entities associated with a Matrix Server application. These operations can also be performed from the command line, as described in the PolyServe Matrix Server Command Reference.
Chapter 10: Matrix Operations on the Applications Tab 119 • Add a service monitor. • Enable or disable the virtual host. • View or change the properties for the virtual host. • Delete the virtual host. To perform these procedures, left-click on the cell for the virtual host (click in the Name column). Then right-click and select the appropriate operation from the menu.See Chapter 9, “Configure Virtual Hosts” on page 120 for more information about these procedures.
11 Configure Virtual Hosts PolyServe Matrix Server uses virtual hosts to provide failover protection for servers and network applications. Overview A virtual host is a hostname/IP address configured on a set of network interfaces. Each interface must be located on a different server. The first network interface configured is the primary interface for the virtual host. The server providing this interface is the primary server.
Chapter 11: Configure Virtual Hosts 121 Matrix Health and Virtual Host Failover To ensure the availability of a virtual host, Matrix Server monitors the health of the administrative network, the active network interface, and the underlying server. If you have created service or device monitors, those monitors periodically check the health of the specified services or devices.
Chapter 11: Configure Virtual Hosts 122 The failover operation to another network interface has minimal impact on clients. For example, if clients were downloading Web pages during the failover, they would receive a “transfer interrupted” message and could simply reload the Web page. If they were reading Web pages, they would not notice any interruption. If the active network interface fails, only the virtual hosts associated with that interface are failed over.
Chapter 11: Configure Virtual Hosts 123 Add or Modify a Virtual Host To add or update a virtual host from the PolyServe Management Console, select the appropriate option: • To add a new virtual host, select Matrix > Add > Add Virtual Host or click the V-Host icon on the toolbar. Then configure the virtual host on the Add Virtual Host window. • To update an existing virtual host, select that virtual host on either the Server or Virtual Hosts window, right-click, and select Properties.
Chapter 11: Configure Virtual Hosts 124 existing application name, or leave this field blank. However, if you do not assign a name, Matrix Server will use the IP address for the virtual host as the application name. Always active: If you check this box, upon server failure, the virtual host will move to an active server even if all associated service and device monitors are inactive or down.
Chapter 11: Configure Virtual Hosts 125 Network Interfaces: When the “All Servers” box is checked, the virtual host will be configured on all servers having an interface on the network you select for this virtual host. When you add another server to the matrix, the virtual host will automatically be configured on that server. This option can be useful with administrative applications. Available:/Members: The Available column lists all network interfaces that are available for this virtual host.
Chapter 11: Configure Virtual Hosts 126 Configure Applications for Virtual Hosts After creating virtual hosts, you will need to configure your network applications to recognize them. For example, if you are using a Web server, you may need to edit its configuration files to recognize and respond to the virtual hosts. By default, FTP responds to any virtual host request it receives.
Chapter 11: Configure Virtual Hosts 127 Rehost a Virtual Host You can use the Rehost option to modify the configuration of a virtual host. For example, you might want to change the primary for the virtual host or reorder the backups. To use this option, select the virtual host, right-click, and then select Rehost. The Virtual Host Rehost window then appears. When you make your changes and click OK, you will see a message warning that this action may cause a disruption of service.
Chapter 11: Configure Virtual Hosts 128 Change the Virtual IP Address for a Virtual Host When you change the virtual IP address of a virtual host, you will also need to update your name server and to configure applications to recognize the new virtual IP address. The order in which you perform these tasks is dependent on your application and the requirements of your site. You can use mx commands to change the virtual IP address of a virtual host. Complete these steps: 1.
Chapter 11: Configure Virtual Hosts 129 When certain events occur on the server where a virtual host is located, the ClusterPulse process will attempt to fail over the virtual host to another server configured for that virtual host. For example, if the server goes down, ClusterPulse will check the health of the other servers and then determine the best location for the virtual host.
Chapter 11: Configure Virtual Hosts 130 • The PanPulse process controls whether a network interface is marked up or down. When PanPulse determines that an interface currently hosting a virtual host is down, ClusterPulse will begin searching for another server on which to locate the virtual host. 3. ClusterPulse narrows the list to those servers without inactive, down, or disabled Matrix Server device monitors. If there are no servers that meet this criteria, the virtual host is not made active anywhere.
Chapter 11: Configure Virtual Hosts 131 Specify Failover/Failback Behavior The Probe Severity setting allows you to specify whether a failure of the service or device monitor probe should cause the virtual host to fail over. For example, you could configure a gateway device monitor to watch a router. The device monitor probe might occasionally time out because of heavy network traffic to the router; however the router is still functioning.
Chapter 11: Configure Virtual Hosts 132 • For service monitors, you can assign a priority to each monitor (the Service Priority setting). If ClusterPulse cannot locate an interface where all services are “up” on the underlying server, it selects an interface where the highest priority service is “up” on the underlying server.
Chapter 11: Configure Virtual Hosts 133 have three up service monitors. The NOFAILBACK policy will not move the virtual host until a healthier server is available. • After the virtual host fails over to node 2, a service monitor probe fails on that node. Now both nodes have a down service monitor. Failback does not occur because the servers are equally healthy. If the failed service is then restored on node 1, that node will now be healthier than node 2 and failback will occur.
12 Configure Service Monitors Service monitors are typically used to monitor a network service such as HTTP or FTP. If a service monitor indicates that a network service is not functioning properly on the primary server, Matrix Server can transfer the network traffic to a backup server that also provides that network service. Overview Before creating a service monitor for a particular service, you will need to configure that service on your servers.
Chapter 12: Configure Service Monitors 135 severity, Start scripts, and Stop scripts) are consistent across all servers configured for a virtual host. Service Monitors and Failover If a monitored service fails, Matrix Server attempts to relocate any virtual hosts associated with the service monitor to a network interface on a healthier server.
Chapter 12: Configure Service Monitors 136 FTP Service Monitor By default the FTP service monitor probes TCP port 21 of the virtual host address. You can change this port number to the port number configured for your FTP server. The default frequency of the probe is every 30 seconds. The default time that the service monitor waits for a probe to complete is five seconds. The probe function attempts to connect to port 21 and expects to read an initial message from the FTP server.
Chapter 12: Configure Service Monitors 137 service if it is not already started. When the service monitor instance becomes inactive, the monitor stops the NT service. When you configure the monitor, you will need to indicate whether dependent services of the NT service should also be started and stopped.
Chapter 12: Configure Service Monitors 138 TCP Service Monitor The generic TCP service monitor defaults to TCP port 0. You should set the port to the listening port of your server software. The default frequency of the probe is every 30 seconds. The default time that the service monitor waits for a probe to complete is five seconds. Because the service monitor cannot know what to expect from the TCP port connection, it simply attempts to connect to the specified port.
Chapter 12: Configure Service Monitors 139 Add or Modify a Service Monitor Adding a service monitor configures Matrix Server monitoring only. It does not configure the service itself. To add or update a service monitor from the PolyServe Management Console, select the appropriate option: • To add a new service monitor, first select the virtual host for the monitor on either the Servers or Virtual Hosts window, then rightclick and select Add Service Monitor (or click the Service icon on the toolbar).
Chapter 12: Configure Service Monitors 140 Monitor Type: Select the type of service that you want to monitor. Timeout: The maximum amount of time that the monitor_agent process will wait for a probe to complete. For most monitors, the default timeout interval is five seconds. You can use the default setting or specify a new timeout interval. Frequency: The interval of time, in seconds, at which the monitor probes the designated service.
Chapter 12: Configure Service Monitors 141 To add or update a service monitor from the command line, use this command: mx service add|update [--type ] [--timeout ] [--frequency ] [] ... NOTE: The --type option cannot be used with the mx service update command. See “Advanced Settings for Service Monitors” for information about the other arguments that can be specified for service monitors.
Chapter 12: Configure Service Monitors 142 Service Monitor Policy The Policy tab lets you specify the failover behavior of the service monitor and set its service priority. Timeout and Failure Severity This setting works with the virtual host policy (either AUTOFAILBACK or NOFAILBACK) to determine what happens when a probe of a monitored service fails.
Chapter 12: Configure Service Monitors 143 monitored resource is not critical, but is important enough that you want to keep a record of its health. AUTORECOVER. This is the default. The virtual host fails over when a monitor probe fails. When the service is recovered on the original node, failback occurs according to the virtual host’s failback policy. NOAUTORECOVER. The virtual host fails over when a monitor probe fails and the monitor is disabled on the original node, preventing automatic failback.
Chapter 12: Configure Service Monitors 144 Probe Type If you are creating a CUSTOM monitor, you may want to set the probe type. These monitors can be configured to be either single-probe or multiprobe. A multi-probe monitor performs the probe function on each node where the monitor is configured, regardless of whether the monitor instance is active or inactive. The built-in monitors work in this manner. Single-probe monitors perform the probe function only on the node where the monitor instance is active.
Chapter 12: Configure Service Monitors 145 Scripts Service monitors can optionally be configured with scripts that are run at various points during matrix operation. The script types are as follows: Recovery script. Runs after a monitor probe failure is detected, in an attempt to restore the service. Start script. Runs as a service is becoming active on a server. Stop script. Runs as a service is becoming inactive on a server.
Chapter 12: Configure Service Monitors 146 without considering this to be an error. In both of these cases, the script should exit with a zero exit status. This behavior is necessary because Matrix Server runs the Start and Stop scripts to establish the desired start/stop activity, even though the service may actually have been started by something other than Matrix Server before ClusterPulse was started.
Chapter 12: Configure Service Monitors 147 Event Severity If a Start or Stop script fails or times out, a monitor event is created on the the node where the failure or timeout occurred. Configuration errors can also cause this behavior. You can view these events on the PolyServe Management Console and clear them from the Console or command line after you have fixed the problems that caused them. When an event is created, the ClusterPulse process may initiate failover of the associated virtual host.
Chapter 12: Configure Service Monitors 148 PARALLEL. The strict ordering sequence for Stop and Start scripts is not enforced. The scripts run in parallel across the matrix as a virtual host is in transition. The PARALLEL configuration can speed up failover time for services that do not depend on strict ordering of Start and Stop scripts.
Chapter 12: Configure Service Monitors 149 To disable a service monitor, select it on the Management Console, rightclick, and select Disable. To disable a service monitor from the command line, use this command: mx service disable ... Enable a Previously Disabled Service Monitor From the Management Console, select the service monitor to be enabled, right-click, and select Enable.
13 Configure Device Monitors PolyServe Matrix Server provides built-in device monitors that can be used to watch local disks, gateway devices, or an NT service, or to monitor access to a SAN disk partition containing a PSFS filesystem. You can also create custom device monitors. Overview A device monitor is configured on one or more servers in the matrix. Depending on the type of monitor, it can be active on all servers on which it is configured, or on only one server.
Chapter 13: Configure Device Monitors 151 Type Default Timeout Default Frequency Other Parameters SHARED_FILESYSTEM 5 seconds 30 seconds Filesystem, filename CUSTOM 60 seconds 60 seconds User probe script Activity Types for Device Monitors The activity type specifies where the device monitor can be active. The activity type can be one of the following: • Single-Active. The monitor is active on only one of the selected servers.
Chapter 13: Configure Device Monitors 152 GATEWAY Device Monitor When certain network failures occur, the servers in a matrix can lose communication with each other. This situation can result in a partition, or split, of the matrix. For example, in a two-server matrix, each server would assume that it remained in the matrix and that the other server was down. The gateway device monitor detects the network failure and prevents the matrix from partitioning.
Chapter 13: Configure Device Monitors 153 The monitor probe queries the status of the NT service. If the status is SERVICE_RUNNING, the service status remains Up. If the status does not indicate that the NT service is running, the service status is set to Down. The NTSERVICE monitor is also available as a service monitor. When deciding whether to create a service monitor or a device monitor, consider the effect that you want the monitor to have on the matrix.
Chapter 13: Configure Device Monitors 154 Custom Device Monitor A CUSTOM device monitor can be used if the built-in device types are not sufficient for your needs. Custom device monitors can be particularly useful when integrating Matrix Server with a custom application. When you create a CUSTOM monitor, you will need to supply the probe script. In the script, probe commands should determine the health of the device as necessary.
Chapter 13: Configure Device Monitors 155 The device monitor activeness policy decision is made as follows: 1. If the device monitor on a specific server is disabled, then the device monitor will not be made active on that server. 2. ClusterPulse considers the list of servers that are both up and enabled and that are configured for the device monitor.
Chapter 13: Configure Device Monitors 156 Add or Modify a Device Monitor Select the appropriate option from the PolyServe Management Console: • To add a new device monitor, select the server to be associated with the monitor from the Servers window, right-click, and select Add Device Monitor (or click the Device icon on the toolbar). Then configure the device monitor on the New Device Monitor window.
Chapter 13: Configure Device Monitors 157 Device Type: Select the appropriate device type (DISK, GATEWAY, NTSERVICE, SHARED_FILESYSTEM, or CUSTOM). See “Overview” on page 150 for a description of these monitors. Frequency and Timeout: These fields are set to the default values for the type of device you have selected. Change them as needed. Additional parameters: Depending on the type of monitor you are creating, you will be asked for an additional parameter. • DISK monitor.
Chapter 13: Configure Device Monitors 158 decimal IP address of the hostname for the server, and is the name assigned to the SHARED_FILESYSTEM device monitor. • CUSTOM monitor. Specify the pathname to the probe script to be used with the monitor. The following example shows a device monitor created on the server svr1.
Chapter 13: Configure Device Monitors 159 Probe Severity The Probe Severity tab lets you specify the failover behavior of the monitor. The Probe Severity setting works with the virtual host policy (either AUTOFAILBACK or NOFAILBACK) to determine what happens when a monitored device fails.
Chapter 13: Configure Device Monitors 160 monitored resource is not critical, but is important enough that you want to keep a record of its health. AUTORECOVER. This is the default. The virtual host fails over when a monitor probe fails. When device access is recovered on the original node, failback occurs according to the virtual host’s failback policy. NOAUTORECOVER. The virtual host fails over when a monitor probe fails and the monitor is disabled on the original node, preventing automatic failback.
Chapter 13: Configure Device Monitors 161 Custom Scripts The Scripts tab lets you configure custom Recovery, Start, and Stop scripts for a device monitor. Device monitors can optionally be configured with scripts that are run at various points during matrix operation. The script types are as follows: Recovery script. Runs after a monitor probe failure is detected, in an attempt to restore the device. Start script. Runs as a device is becoming active on a server. Stop script.
Chapter 13: Configure Device Monitors 162 must be robust enough to run when the device is already stopped, without considering this to be an error. In both of these cases, the script should exit with a zero exit status. This behavior is necessary because Matrix Server runs the Start and Stop scripts to establish the desired start/stop activity, even though the device may actually have been started by something other than Matrix Server before the ClusterPulse process was started.
Chapter 13: Configure Device Monitors 163 If you want to reverse this order, preface the Stop script with the prefix [post] on the Scripts tab. Event Severity If a Start or Stop script fails or times out, a monitor event is created on the the node where the failure or timeout occurred. Configuration errors can also cause this behavior. You can view these events on the PolyServe Management Console and clear them from the Console or command line after you have fixed the problems that caused them.
Chapter 13: Configure Device Monitors 164 2. ClusterPulse waits for all Stop scripts to complete. 3. The Start script is run on the server where the virtual host or shared device is becoming active. PARALLEL. The strict ordering sequence for Stop and Start scripts is not enforced. The scripts run in parallel across the matrix as a shared device or virtual host is in transition.
Chapter 13: Configure Device Monitors 165 When a device monitor detects a failure, Matrix Server attempts to fail over the active virtual hosts associated with that monitor. By default, all virtual hosts on the servers used with the device monitor are dependent on the device monitor. However, you can specify that only certain virtual hosts be dependent on the device monitor.
Chapter 13: Configure Device Monitors 166 Probe Type. The servers on which the monitor probe will occur. Select Single-Probe to conduct the probe only on the server where the monitor is active. Select Multi-Probe to conduct the probe on all servers configured for the monitor. Activity Type. Where the monitor can be active. The options are: • Single-Active. The monitor is active on only one of the selected servers.
Chapter 13: Configure Device Monitors 167 Available Servers/Selected Servers. The type of the device monitor affects whether the monitor should be configured on one or multiple servers. • A GATEWAY monitor is multi-active and can be configured on multiple servers. • For SHARED_FILESYSTEM monitors, you should select the servers that mount the monitored filesystem and are running the applications that access data from that filesystem. • A GATEWAY monitor can be configured on multiple servers.
Chapter 13: Configure Device Monitors 168 Enable a Device Monitor From the Management Console, select the device monitor to be enabled, right-click, and select Enable. To enable a device monitor from the command line, use this command: mx device enable ... View Device Monitor Errors To view the last error that occurred on a device monitor, select that monitor, right-click, and select View Last Error. Copyright © 1999-2006 PolyServe, Inc. All rights reserved.
14 Configure Notifiers If you would like certain actions to take place when matrix events occur, you can configure notifiers that define how the events should be handled. Overview PolyServe Matrix Server uses notifiers to enable you to view event information generated by servers, network interfaces, virtual hosts, service monitors, device monitors, and filesystems. Notifiers send events from these entities to user-defined notifier scripts.
Chapter 14: Configure Notifiers 170 When adding a notifier, you will need to specify a name for the notifier and to supply the script to be run when an event is triggered that matches the event and entity combination. The notifier script will be run with any arguments that you included in the script string. The script may read STDIN to accept the event message.
Chapter 14: Configure Notifiers 171 Script: Enter the name of the script that will be run when an event occurs. Event: Check the events for which you want to receive notification. Entity: Check the entities for which you want to receive notification. The USER1 - USER7 entities are user-defined entities for the mxlogger command. See “Add Your Own Messages to the Event Log” on page 210. The notifier now appears in the Notifiers window.
Chapter 14: Configure Notifiers 172 Enable a Notifier Select the notifier to be enabled from the Notifiers window, right-click, and select Enable. To enable a notifier from the command line, use this command: mx notifier enable ... Test a Notifier Select the notifier to be tested from the Notifiers window, right-click, and select Test. The event messages for each configured entity will now be sent to the notifier.
Chapter 14: Configure Notifiers 173 The Test Notifier option causes a test event to be generated for each of the event/entity combinations that you configure for the notifier. Following is an example: 10.10.1.1 Error SERVICEMONITORS 0 Oct 31 2000 13:31:31 TEST Notifier message for A Sample Notifier Script The following batch file can be used in conjunction with a command-line tool to send e-mail when notifier events are generated. SET/P INPUT= SET LOG=C:\BATCH.
15 Test Your Configuration After you have configured Matrix Server, we recommend that you perform a set of basic tests to validate that SAN shared filesystem operation, virtual host operation and failover, DNS load-balancing operation and failover, and failover of the LAN administrative network work correctly. After completing these tests successfully, you may want to run a more substantial test of your specific requirements to validate that Matrix Server is working in your environment.
Chapter 15: Test Your Configuration 175 3. Create a PSFS filesystem on an unused partition on this disk. 4. Log into each of the servers as Administrator and perform some basic I/O tests to the shared filesystem using a tool such as WinZip. Verify that changes to the filesystem made by each server are visible to all other servers in the matrix. Test Filesystem Failure Recovery 1. Configure the matrix with a shared filesystem as described in the previous procedure. 2.
Chapter 15: Test Your Configuration 176 Test Virtual Host Operation and Failover The following procedure tests automatic failover and recovery reintegration. It is best to run these tests in a non-production environment. Test Failure and Reintegration of Servers Use the following procedure to test server failure, failover, and reintegration: 1. From the PolyServe Management Console, log into a backup matrix server. 2. Configure the matrix with a single virtual host. 3.
Chapter 15: Test Your Configuration 177 5. Verify that Matrix Server detects the service failure. The virtual host should be inactive on the primary server and active on the first backup server. 6. Start the service that you are testing on the primary server. 7. Verify that Matrix Server detects that the service has become active. 8. Verify that the virtual host is active on the primary server and inactive on the backup servers.
Chapter 15: Test Your Configuration 178 Validate Correct Load-Balancing Operation The following procedure validates that DNS round robin and Matrix Server are working correctly. To test the access, use the %SYSTEMROOT%\system32\PING.EXE utility. 1. Ping www.acmd.com. 2. Validate that the returned IP address is 192.168.100.1. 3. Ping www.acmd.com. 4. Validate that the returned IP address is 192.168.100.2. 5. Continue to ping and validate that the opposite IP address is returned by DNS.
Chapter 15: Test Your Configuration 179 Test LAN Failover of Administrative Matrix Traffic Use the following procedure to test the LAN administrative traffic failover capability of Matrix Server: 1. Connect your matrix servers with at least two physically separate LANs. Configure the Linux network software to enable the interfaces to these networks on each of the matrix servers. 2. From the PolyServe Management Console, log into one of the matrix servers. 3.
16 Advanced Topics The topics described here provide technical details about Matrix Server operations. This information is not required to use Matrix Server in typical configurations; however, it may be useful if you want to design custom scripts and monitors, to integrate Matrix Server with custom applications, or to diagnose complex configuration problems. The Effect of Monitors on Failover Typically a virtual host has a primary network interface and one or more backup interfaces.
Chapter 16: Advanced Topics 181 The following examples show state transitions for a service monitor that uses the default values for autorecovery, priority, and serial script ordering. Start and Stop scripts are also defined for the monitor. The virtual host associated with the monitor has a primary interface and two backup interfaces. The first example shows the state transitions that occur at startup from an unknown state. At i1, all instances of the monitor have completed stopping.
Chapter 16: Advanced Topics 182 When a failure occurs on the Primary, the virtual host needs to fail over to a backup. Matrix Server now looks for the best location for the virtual host. Because the probe status on the first backup is “down,” Matrix Server chooses the second backup, where the probe status is “up.” At i5 in the following example, the probe fails on the Primary. At i6, the virtual host is deconfigured on the Primary. At i7, the monitor stop script begins on the Primary.
Chapter 16: Advanced Topics 183 Custom Device Monitors A custom device monitor is associated with a list of servers and a list of virtual hosts configured on those servers. A custom device monitor can be active on only one server at a time. On each server, the monitor uses a probe mechanism to determine whether the service is active. The probe mechanism is in one of the following states on each server: Up, Down, Unknown, Timeout. A custom device monitor also has an activity status on each server.
Chapter 16: Advanced Topics 184 Time Primary t1 active Vhost status inactive Service probe status unknown Service monitor activity active undefined star ting Device probe status unknown Device monitor activity active undefined star ting up inactive down inactive stopping up First Bac kup Vhost status inactive Service probe status unknown Service monitor activity undefined up inactive stopping Device probe status Device monitor activity Sec ond Bac kup Vhost status unknown up un
Chapter 16: Advanced Topics 185 Integrate Custom Applications There are many ways to integrate custom applications with Matrix Server: • Use service monitors or device monitors to monitor the application • Use a predefined monitor or your own user-defined monitor • Use Start, Stop, and Recovery scripts Following are some examples of these strategies.
Chapter 16: Advanced Topics 186 Built-In Monitor or User-Defined Monitor? To decide whether to use a built-in monitor or a user-defined monitor, first determine whether a built-in monitor is available for the service you want to monitor and then consider the degree of content verification that you need.
Chapter 16: Advanced Topics 187 This script connects to port 2468, sends a string specified by the protocol, and determines whether it has received an expected response. You distribute this script to the same location on all servers on virtual host vh1, and then create a custom service monitor that uses that script. This provides not only verification of the connection, but a degree of content verification.
Chapter 16: Advanced Topics 188 • MX_SERVER=IP address The primary address of the server that calls the script. The address is specified in dotted decimal format. • MX_TYPE=(SERVICE|DEVICE) Whether the script is for a service or device monitor. • MX_VHOST=IP address The IP address of the virtual host. The address is specified in dotted decimal format. (Applies only to service monitors.) • MX_PORT=Port or name The port or name of the service monitor. (Applies only to service monitors.
17 SAN Maintenance The following information and procedures apply to SANs used with PolyServe Matrix Server. Server Access to the SAN When a server is either added to the matrix or rebooted, Matrix Server needs to take some administrative actions to make the server a full member of the matrix with access to the shared filesystems on the SAN. During this time, the PolyServe Management Console reports the message “Joining matrix” for the server.
Chapter 17: SAN Maintenance 190 • Repeated I/O errors when the server tries to write to a PSFS journal. The server then loses access to the affected filesystem. When the disk experiencing the I/O errors is fixed, the server will automatically regain access to the filesystem. The PolyServe Management Console typically displays an alert message when a server loses access to the SAN. (See Appendix B for more information about these messages.
Chapter 17: SAN Maintenance 191 partitions need to be repaired. Also, if a network partition occurs, mxsanlk can be used to determine which network partition has control of the SAN. Following is some sample output. The command was issued on host 10.10.30.3. The SDMP administrator is the administrator for the matrix to which the host belongs. There are three membership partitions. mxsanlk This host: 10.10.30.3 This host’s SDMP administrator: 10.10.30.
Chapter 17: SAN Maintenance 192 • locked, cannot access The host on which mxsanlk was run held the SANlock but is now unable to access it. The membership partition may need repair. • trying to lock, not yet committed by owner The SANlock is either not held or has not yet been committed by its holder. The host on which mxsanlk was run is trying to acquire the SANlock. • unlocked, trying to lock The SANlock does not appear to be held. The host on which mxsanlk was run is trying to acquire the SANlock.
Chapter 17: SAN Maintenance 193 • trying to lock (lock is corrupt, will repair) The host on which mxsanlk was run is trying to acquire the SANlock. The SANlock was corrupted but will be repaired. • locked (lock is corrupt, will repair) The host on which mxsanlk was run holds the lock. The SANlock was corrupted but will be repaired. If a membership partition cannot be accessed, use the mprepair program to correct the problem.
Chapter 17: SAN Maintenance 194 mprepair Utility The mprepair utility can be used to repair any problems if a failure causes servers to have inconsistent views of the membership partitions. This utility is invoked from the operating system prompt. NOTE: Matrix Server cannot be running when you use mprepair. To stop the matrix, issue the command net stop matrixserver from the Command Prompt.
Chapter 17: SAN Maintenance 195 INACCESSIBLE. The mprepair utility cannot access the device containing the membership partition. CORRUPT. The partition is not valid. MISMATCH. The membership partition is valid but its MP list does not match the server’s local MP list. If the status is NOT FOUND or INACCESSIBLE, there may be a problem with the disk or with another SAN component. When the problem is repaired, the status should return to OK. If the status is CORRUPT, you should resilver the partition.
Chapter 17: SAN Maintenance 196 To fix this problem, use the --inactivate_mp option (described under “mprepair Options” below) to change the state of the membership partition to “inactive.” You can then import the disk into the matrix. Sizes for Membership Partitions Matrix Server stores the size of the smallest membership partition that was created during the Matrix Server installation. When you add or replace a membership partition, the new partition must be at least as large as that original partition.
Chapter 17: SAN Maintenance 197 The output shows the local membership partition list on the server where you are running mprepair. It then compares this list with the lists located on the disks containing the membership partitions. The output also includes the device database records for the disks containing the membership partitions. Following is an example.
Chapter 17: SAN Maintenance 198 However, in certain situations you may need to perform the resilver operation manually. For example, a membership partition might become corrupt or a local membership list might become out of date. To resilver from a particular partition, type the following command: mprepair --resilver UID/PART# UID is the UID for the device and PART# is the number of the partition on the device.
Chapter 17: SAN Maintenance 199 Change a Host Bus Adapter or Driver Matrix Server and the psd driver must be disabled when you add, remove, or update a Host Bus Adapter or its driver. The following procedure describes how to change a Host Bus Adapter or driver on a matrix server. All commands are run from the Command Prompt. 1. Stop Matrix Server: net stop matrixserver 2. Disable the Matrix Server service: mxservice -uninstall 3. Remove the psd driver from the driver stack: psdcoinst -uninstall 4.
Chapter 17: SAN Maintenance 200 Change the Fencing Method To change the fencing method, complete these steps: 1. Stop Matrix Server on all servers in the matrix. 2. Open the PolyServe Management Console Login window, enter the login credentials for a server in the matrix, and click the Configure button on the Login window. 3. On the Configure Matrix window, select the Fencing tab and then configure the appropriate fencing method.
Chapter 17: SAN Maintenance 201 Alternatively, if the server cannot be rebooted, but can be confirmed to have no access to the SAN, run 'mx server markdown ' to restore normal matrix operation. The following example shows the operation of the command: $ mx server markdown 99.10.20.4 This utility is used to verify that a server is down in the event that it cannot be fenced and cannot be rebooted. IMPORTANT: This utility must be run only after the server has been physically verified to be down.
Chapter 17: SAN Maintenance 202 Server Cannot Be Located If the matrix reports that it cannot locate a server on the SAN but you know that the server is connected, there may be an FC switch problem. On a Brocade FC switch, log into the switch and verify that all F-Port and L-Port IDs specified in switchshow also appear in the local nameserver, nsshow. If the lists of ports are different, reboot the switch. If the reboot does not clear the problem, there may be a problem with the switch.
Chapter 17: SAN Maintenance 203 Online Replacement of a FibreChannel Switch This procedure applies only to sites using EMC PowerPath. When a matrix includes multiple FibreChannel switches, you can replace a switch without affecting normal matrix operations. The following conditions must be met when performing online replacement of a FibreChannel switch. • The replacement switch must be the same model as the original switch and must have the same number of ports.
Chapter 17: SAN Maintenance 204 4. Back up the zone configuration information, either from the original switch or from another switch in the fabric. Use the cfgShow command and record its output. 5. Connect the power and either the Ethernet or the serial console cable to the new switch. 6. Log on to the new switch. 7. Disable the switch with the switchDisable command. 8. Disable any stale active configuration on the new switch with the cfgDisable command. 9.
Chapter 17: SAN Maintenance 205 Also verify that no zone conflicts are being reported on the inter-switch links (ISL). To ensure a highly available configuration after the switch has been replaced, verify that all servers have eligible I/O paths through the replaced switch. For example, you can use the PowerPath powermt command to do this. Replace a McDATA FC Switch To replace a McDATA FibreChannel switch, complete these steps: 1.
Chapter 17: SAN Maintenance 206 on the replacement switch should be removed to allow the fabric to properly communicate current zoning when the switch joins the fabric. 6. Add the private community to the Configure > Management > SNMP tab and ensure that it is write enabled. 7. Connect the FC connectors to the new switch. Be sure to plug them into the same ports as on the original switch. 8. Bring the switch online. The new switch should now connect to the rest of the fabric.
18 Other Matrix Maintenance Although Matrix Server requires little special maintenance beyond that which is normally required for your servers and services, you may need to perform the following activities: • Maintain the Matrix Server event log • Disable a server for maintenance • Troubleshoot a matrix • Troubleshoot service and device monitors Maintain the Matrix Server Event Log Matrix Server stores its log messages in the Matrix Server event log on each server in the matrix.
Chapter 18: Other Matrix Maintenance 208 Windows Event Viewer Select Start > Programs > Administration Tools > Event Viewer, and then click on Matrix Server to see the log messages. You can use the options on the Action menu to manipulate the event log. PolyServe Management Console You can also use the PolyServe Management Console to view or maintain the event log on each server. The changes you make affect only the event log for the server selected on the Servers window.
Chapter 18: Other Matrix Maintenance 209 View the Event Log To view the event log for a specific server, select that server, right-click, and then select View Log. The Server Log window displays the most recent messages from the event log. You can select the types of messages that you want to view by checking or unchecking the boxes at the top of the window. Use the scroll bars to move up, down, left, and right in the file, allowing you to see entire messages without resizing the window.
Chapter 18: Other Matrix Maintenance 210 Audit Administrative Commands Matrix Server provides an audit feature that can be used to log administrative commands in the event log. The following types of commands are audited: • Login authenication for the PolyServe Management Console and mx commands • Commands invoked via the PolyServe Management Console • mx commands, with the exception of status commands The audit entry contains the IP address and port number of the client TCP connection.
Chapter 18: Other Matrix Maintenance 211 The -l level option specifies the severity of the message. level can be ERROR, WARNING, INFO, EVENT, FATAL, FAILUREAUDIT, SUCCESSAUDIT, TRACE, or DEBUG. The -d option allows you to specify a numeric message ID. The default is 100. The -G option specifies that the message is global; the -L option specifies that it is local. The default is local. If the log-text contains special characters, it must be enclosed in quotation marks.
Chapter 18: Other Matrix Maintenance 212 Online Incident Report form. The form and the contact numbers are available on the following PolyServe Web site. http://www.polyserve.com/support.php Run mxcollect You will need to run the mxcollect utility on each node. The utility is located in the %SYSTEMROOT%\Polyserve\MatrixServer\tools folder. Go to this location and double-click the file mxcollect.exe. You will then see a command window that says “Collecting files,” as in the following example.
Chapter 18: Other Matrix Maintenance 213 Upload mxcollect Files to PolyServe Technical Support After running mxcollect, you can upload the resulting files to PolyServe Technical Support. The ftp account is at ftp.polyserve.com. If you do not have an ftp account or have lost your ftp password, contact PolyServe Technical Support as described above under “Create an FTP Account.
Chapter 18: Other Matrix Maintenance 214 Check the Server Configuration The Matrix Server mxcheck utility can be used to verify that a server meets the configuration requirements for Matrix Server. The utility is run automatically whenever Matrix Server is booted. You can also run mxcheck manually. It is located in the installation directory, which is typically %SystemDrive%\Program Files\PolyServe\MatrixServer\bin.
Chapter 18: Other Matrix Maintenance 215 Troubleshoot Matrix Problems Matrix Server Fails to Start If the Matrix Server service fails to start on a server, check that the domain name of the server is configured in the DNS suffix list. The DNS suffix list is configurable via the Advanced TCP/IP Properties. The Server Status Is “Down” If a server is running but Matrix Server shows it as down, follow these diagnostic steps: 1. Verify that the server is connected to the network. 2.
Chapter 18: Other Matrix Maintenance 216 A Virtual Host Is Inaccessible If a site is inaccessible but Matrix Server indicates that it is okay, verify that the service is running and that the actual data exists on the server providing the network interface currently used by the virtual host. Matrix Server Exits Immediately If the ClusterPulse process exits immediately on starting, first determine whether the process is running.
Chapter 18: Other Matrix Maintenance 217 “Down” Status The “Down” status indicates that the monitor finished its probe but it did not complete successfully. Depending on the monitor type (such as HTTP or SMTP), the service monitor probe may involve more than being able to connect to the network service. For many built-in service monitors, Matrix Server may conclude that the monitor is down even if the TCP connection succeeds.
Chapter 18: Other Matrix Maintenance 218 The events are: CONFIG_ERROR. A script must exist and be executable by root. This condition is checked for probe, Start, Stop, and Recovery scripts each time an attempt is made to execute the script. SCRIPT_SYSERR. The monitor_agent tried to fork a process to execute the script but the fork system call failed. This condition can occur when Matrix Server is trying to execute probe, Start, Stop, or Recovery scripts. START_TIMEOUT.
Chapter 18: Other Matrix Maintenance 219 Clear an Error After you have determined the cause for a script event, be sure to correct the script on all servers that reported the event. You should then clear the error. On the Management Console, select the monitor, right-click, and select Clear Last Error. To clear an error from the command line, use these commands: mx service clear ... mx device clear ...
Chapter 18: Other Matrix Maintenance 220 probe script will be executed at the probe frequency. Inactive status means that the monitor is not the only one currently providing the service. For example, when the primary server is functioning normally, a monitor on a backup server may show an Inactive status. The probe script is still executed in the Inactive state.
Chapter 18: Other Matrix Maintenance 221 Port Transport Type Description 9050 TCP Proprietary connection from the Management Console (configurable, as described below) 9070 TCP HTTP connection from the Management Console (fixed, IANA registration has been applied for) 9071 TCP HTTPS connection from the Management Console (fixed, IANA registration has been applied for) Internal Network Port Numbers The following network port numbers are used for internal, server-toserver communication.
A Management Console Icons The Management Console uses the following icons. Matrix Server Entities The following icons represent the Matrix Server entities. If an entity is disabled, the color of the icon becomes less intense. Copyright © 1999-2006 PolyServe, Inc. All rights reserved.
Appendix A: Management Console Icons 223 Additional icons are added to the entity icon to indicate the status of the entity. The following example shows the status icons for the server entity. The status icons are the same for all entities and have the following meanings. Monitor Probe Status The following icons indicate the status of service monitor and device monitor probes. If the monitor is disabled, the color of the icons is less intense. Copyright © 1999-2006 PolyServe, Inc. All rights reserved.
Appendix A: Management Console Icons 224 On the Applications tab, virtual hosts and single-active monitors use the following icons to indicate the primary and backups. Multi-active monitors use the same icons but do not include the primary or backup indication. Management Console Alerts The Management Console uses the following icons to indicate the severity of the messages that appear in the Alert window. Copyright © 1999-2006 PolyServe, Inc. All rights reserved.
B Error and Event Log Messages When certain errors occur, Matrix Server writes messages to the PolyServe Management Console. Other error messages are written to the Matrix Server event log. Management Console Alert Messages NN.NN.NN.NN has lost a significant portion of its SAN access, possibly due to a SAN hardware failure The specified server is unable to write to any of the membership partitions. Ensure that the server can access the membership partitions and also has write access to them.
Appendix B: Error and Event Log Messages 226 NN.NN.NN.NN should be rebooted ASAP as it stopped matrix network communication DATE HH:MM:SS and was excluded from the SAN to protect filesystem integrity The server was excluded from the matrix because it could no longer communicate over the network. The server should be rebooted at the first opportunity. Also check the network and make sure that the server is not experiencing a resource shortage. NN.NN.NN.
Appendix B: Error and Event Log Messages 227 Error connecting to server : unknown host The server identified as is not responding to the connection request from the Management Console. Verify that you typed a valid hostname or IP address in the login window. This error may indicate that the ClusterPulse process is not running; restart Matrix Server on the server.
Appendix B: Error and Event Log Messages 228 Fencing operation failed, reboot NN.NN.NN.NN ASAP. NN.NN.NN.NN stopped matrix communication DATE HH:MM but cannot be excluded from the matrix because of a networking or fencing hardware failure or misconfiguration. To protect filesystem integrity, some or all filesystem operations may be paused until NN.NN.NN.NN is rebooted or until fencing operations can be performed.
Appendix B: Error and Event Log Messages 229 If you receive one of these messages, report it to PolyServe Technical Support at your earliest opportunity. Majority of membership partitions are unwritable, possibly due to a SAN or storage hardware failure. As a result, disk imports and deports cannot be done, and some servers may be unable to mount filesystems. In addition, Matrix Server’s ability to recover from a future server failure is compromised.
Appendix B: Error and Event Log Messages 230 Matrix unable to take control of SAN, because another matrix that includes NN.NN.NN.NN currently controls the SAN. Possibly a networking failure or misconfiguration has partitioned these servers from the servers that control the SAN, or possibly this matrix has been misconfigured to share membership partitions with another matrix. Check the matrix configuration and add the server if it is not currently a member.
Appendix B: Error and Event Log Messages 231 Matrix unable to take control of SAN. As a result, some or all filesystem operations may be paused throughout the matrix. In addition, filesystem mounts and unmounts and disk imports and deports cannot be performed. Verify that this message, not one of the previous messages, is displayed. Also verify that the servers can access the membership partitions and have write access to them, and that the servers can communicate with the FibreChannel switch.
Appendix B: Error and Event Log Messages 232 psdNpN on NN.NN.NN.NN is stalled on locks from NN.NN.NN.NN A DLM lock request has been outstanding for a long period of time on the specified server. Probably the server is severely overloaded or is experiencing a resource shortage. As a last resort, reboot the server to clear the problem.
Appendix B: Error and Event Log Messages 233 Singleton matrix unable to take control of SAN, because the matrix that includes NN.NN.NN.NN currently controls the SAN. Possibly this server has not been added to the matrix or has been deleted from the matrix, or possibly a networking failure or misconfiguration has partitioned this server from the servers that control the SAN. Check the matrix configuration and add the server if it is not currently a member.
Appendix B: Error and Event Log Messages 234 ClusterPulse Messages Bad command -- Could not find device monitor instance for XXX on server YYY The monitor_agent process is reporting status on a device monitor with device name XXX on server YYY but the ClusterPulse process does not recognize this device. Probably the Management Console has removed the device monitor and monitor_agent has already sent the status to ClusterPulse. Therefore, no corrective action is required.
Appendix B: Error and Event Log Messages 235 Internal system error -- Internal select error at server X.X.X.X: [select ?] with errno of N The ClusterPulse process received a system error. Report this error to PolyServe Technical Support at your earliest opportunity. License error -- LICENSE ERROR ON SERVER %s: %s; clusterpulse WILL BE TERMINATED IN %d HOURS %d MINUTES The ClusterPulse process has recognized a license violation. This message will be repeated every 15 minutes.
Appendix B: Error and Event Log Messages 236 Network error -- set_readable called with unknown socket N Network error -- set_writeable called with unknown socket N If you receive this message, notify PolyServe Technical Support at your earliest convenience. Object not found -- could not find service monitor instance: IP X.X.X.X port N The ClusterPulse process received a status message from the monitor_agent process regarding the service monitor for virtual host address IP X.X.X.
Appendix B: Error and Event Log Messages 237 Script error -- Matrix Server cannot invoke a non executable agent monitor_agent Verify that the execute permission on monitor_agent is set correctly. Script error -- Matrix Server daemon received an illegal reply from the agent monitor_agent: The ClusterPulse process received a reply buffer from the monitor_agent process that could not be parsed. Check the reply buffer for uppercase strings.
Appendix B: Error and Event Log Messages 238 PSFS Filesystem Messages If you receive a panic message from the PSFS filesystem, report it to PolyServe Technical Support at your earliest convenience. Then reboot the affected server to recover from the error condition. Distributed Lock Manager Messages The Distributed Lock Manager (DLM) generates error messages if it detects that a filesystem operation will block indefinitely because of an internal error.
Appendix B: Error and Event Log Messages 239 SCL Messages If messages such as the following appear in the Matrix Server event log, the matrix may not be able to start up properly. Consult PolyServe Technical Support for assistance. CRITICAL ERROR: Unable to fence host xxxxxxxx xxxxxxx: xxxx (switch=xxxx) Network Interface Messages The PanPulse process writes messages to the Matrix Server event log about the state of the network interfaces configured in the matrix.
Appendix B: Error and Event Log Messages 240 PanPulse should receive matrix traffic at certain intervals on the active administrative network interface. If PanPulse does not receive any traffic during this period of time, it will report the following message: No traffic received on active panpulse interface () PanPulse then fails over the active interface to another network interface and reports the following.
Appendix B: Error and Event Log Messages 241 Operational Problems When fabric fencing is configured on the matrix, errors such as the following can appear in the Alert panel on the PolyServe Management Console. FenceAgent : disabled status The error indicates that there is an operational problem with the specified FibreChannel switch (192.168.10.74 in the following example). 192.168.10.100 [Critical ] [2006-03-13 15:22:59] ClusterPulse SERVERS Alert - FenceAgent 192.168.10.
Appendix B: Error and Event Log Messages 242 Default VSAN Is Disabled on Cisco MDS Switch If you are using a Cisco MDS FibreChannel switch with firmware version 2.1 or later, you may see the alerts such as the following on the PolyServe Management Console: FenceAgent : no response to queries from You might also see additional logging in psSAN.log: Transient Error: fabric.
Index A administrative network defined 3 failover 47 network topology 46 requirements for 45 select 46 administrative traffic 50 alerts cluster 29 on Management Console 29 Applications tab drag and drop operations 115 filter display, filter, on Applications tab 114 icons 112 manage monitors 119 manage resources 115 menu operations 118 modify display 111 rehost virtual host 127 reorder columns 111 applications, on Management Console create 110 name of 110 status 112 applications, software configure for virtu
Index 244 bookmarks 20 Clear History button 19 Connect button 19 custom monitors device 154 environment variables for scripts 187 service 138 D device database defined 6 membership partitions 52 device monitor activeness policy 154 activity status 219 CUSTOM monitor 154 custom starting/stopping actions 162 defined 8 DISK monitor 151 events 168 GATEWAY monitor 152 multi-active 151 NTSERVICE monitor 152 SHARED_FILESYSTEM monitor 153 troubleshooting 216 device monitor configuration add or update 156 advance
Index 245 view 168 service monitor clear from Console 149 event severity behavior 147 view 149 F failover device monitor 154 monitors, customize for 130 probe severity, monitors 133 test 176 virtual host 121 virtual host activeness policy 129 FC switch cannot locate server 202 online replacement 203 fencing cannot fence server 200 change configuration 200 Fence Agent error messages 240 filesystem, PSFS access 73 backup 93 check with psfscheck 94 crash recovery 76 create 76 create with Management Console
Index 246 display 197 inactivate 198 inactive 195 resilver 197 memory, server 8 mount path, filesystem assignment 86 mprepair utility 193 mx server markdown command 200 mxcheck utility 214 mxcollect utility 211 mxlogd process 5 mxlogger command 210 N network interface administrative network failover 47 administrative traffic 45 administrative traffic, allow or discourage 50 configure 45 display on Console 46 enable or disable for virtual hosting 50 error messages 239 virtual hosts 46 network ports extern
Index 247 S SAN (storage area network) 3 SAN access 189 sandiskinfo utility 58 SANPulse process defined 4 error messages 238 responsibilities 75 SCL (Storage Control Layer) defined 4 device database 6 operation of 51 scripts, device monitor configure 161 event severity 163 ordering 163 scripts, event severity 147 scripts, notifier 173 scripts, service monitor applications, integrate with 186 configure 144 event severity 147 SDMP process 3 server backup 7 cannot fence 200 change FC switch port 38 configura
Index 248 service monitor 144 subdevices, for dynamic volumes 61 T TCP service monitor 138 troubleshooting monitors 216 U users, authentication 19 V virtual host activeness policy 129 applications, configure for virtual host 126 device monitors, dependency on 165 enable or disable network interface 50 failover 128 guidelines 122 policy for failback 124 rehost via Applications tab 127 virtual host configuration add or update 123 delete 127 volume database 7 volumes, basic or dynamic 61 Copyright © 1999