SANworks by Compaq Application Notes – Data Replication Manager Windows-Based Server with Single Host Bus Adapter Part Number: AA-RS1XA-TE First Edition (March 2002) Product Version: ACS Version 8.6-4P This document details an extended configuration that is available for the SANworksTM Data Replication Manager by Compaq. This disaster-tolerant solution provides controller-based mirroring across an extended Fibre Channel link, and is for those customers who wish to use Microsoft Windows 2000/NT 4.
© 2002 Compaq Information Technologies Group, L.P. Compaq, the Compaq logo, SANworks, StorageWorks, Tru64, and OpenVMS are trademarks of Compaq Information Technologies Group, L.P. in the U.S. and/or other countries. Microsoft, Windows, and Windows NT are trademarks of Microsoft Corporation in the U.S. and/or other countries. UNIX is a trademark of The Open Group in the U.S. and/or other countries. All other product names mentioned herein may be trademarks of their respective companies.
Application Notes Contents Application Notes Contents These Application Notes cover the following major topics: • Introduction, page 4 • The Single FC HBA Solution Configuration, page 4 — Configuration, page 5 — Software and Firmware, page 5 — Data Replication Manager Configuration, page 5 • Solution Behavior, page 6 — The Normal Situation, page 6 — Failure Modes, page 7 • Additional Considerations, page 12 — Storage Area Network (SAN) Management, page 12 — Multiple Switches in the FC Fabric, page 1
Introduction • SANworks Data Replication Manager by Compaq HSG80 ACS Version 8.6-4P Failover/Failback Procedures Guide, part number AA-RPJOC-TE • SANworks Data Replication Manager by Compaq HSG80 ACS Version 8.6-4P Release Notes, part number AA-RPJ2C-TE • SANworks Data Replication Manager by Compaq HSG80 ACS Version 8.6-1P Scripting User Guide, part number EK-DRMSC-OA. C01 • Compaq StorageWorks SAN Switch Zoning Reference Guide, part number EK-P20ZG-GA • Compaq SANworks Secure Path Version 3.
The Single FC HBA Solution Configuration The single FC HBA configuration contains the following SPOF: • The FC HBA within the server • The fiber optic cable running between the server and the FC switch • The gigabit interface converter (GBIC), located within the FC switch • The FC switch on the local site • The FC switch on the remote site • The intersite link (ISL) Some of these SPOF can be mitigated, even with single FC HBA servers.
Solution Behavior Solution Behavior This section describes the setup of the solution in a Normal state and the behavior of the total solution upon failure of certain components. The Normal Situation Before you can determine that a failure has occurred, you need to know the status of the Normal state. Throughout this document, a server named mitersaw is used. The HSG80 presents three storage units (D1, D2, and D3) to this server. Each unit is an RCS.
Solution Behavior The menu selections for the first three settings are shown in Figure 3. Figure 3: Secure Path Manager Properties menu settings Failure Modes Failure of Server If the server fails, the initiator site loses access to the storage system. To guard against this event, the server should be part of a Microsoft cluster with another server. This allows failover of cluster groups from one server to another server in case of a hardware component failure.
Solution Behavior Failure of Cable Between FC Switch and HSG80 Host Port 1 (Left Port) at Local Site In this case, Secure Path detects that the path is broken and selects an alternate path. When the broken path is repaired, the units that are on the alternate controller (drives K and M in Figure 4) fail back to the original controller.
Solution Behavior Example Display 1 shows the screen display from a SHOW THIS and SHOW OTHER command. On the HSG80 controller, the port_2_topology field of both controllers is different, thus indicating that one of the HSG80 controllers is unable to communicate to the remote site. Note that the port_2_topology value for this controller is fabric up, while for other controller it is connection down.
Solution Behavior Reported PORT_ID = 5000-1FE1-0002-75A2 PORT_2_TOPOLOGY = FABRIC (connection down) REMOTE_COPY = ATHENS Cache: 256 megabyte write cache, version 0012 Cache is GOOD No unflushed data in cache CACHE_FLUSH_TIMER = DEFAULT (10 seconds) Mirrored Cache: 256 megabyte write cache, version 0012 Cache is GOOD No unflushed data in cache Battery: NOUPS FULLY CHARGED Expires: 25-MAR-2004 Athens_Top> If one of the HSG80 controllers is still capable of communicating to its peer controller on the remote
Solution Behavior Failure of HSG80 at Remote Site If an HSG80 controller at the remote site fails, the peer controller at the local site is unable to communicate with the failed controller at the remote site. When the failure is detected (after 20-25 seconds), the controller at the local site fails over all units to the other controller at the local site.
Additional Considerations Failure of Intersite Link (ISL) If the ISL is lost, the behavior of the total solution depends on the actual configuration of the local HSG80 controller. There are three possible configurations: • REMOTE COPY SET ERROR_MODE=NORMAL — Association Set With Logdisk: The units remain operational on the current controllers and operation continues. When the ISL is restored, only the updates as of the moment of ISL failure will be sent to the remote site.
Additional Considerations • Number of hops between the server and the host port 2 of the HSG80s. NOTE: Only one hop may be a long distance hop. It is common practice to spread the number of servers equally over the number of FC switches. Host port 1 of top and bottom controllers in an HSG80 controller pair must be connected to different switches (if available) to increase the availability of the total solution.
Additional Considerations In larger configurations, the combination of the use of single FC HBA server clusters and multiple FC switches and the spreading of host port 1 of the HSG80 controllers over more than one FC switch results in a solution that has no SPOF. This configuration is shown in Figure 9. In this case, a failure of the right FC switch causes the replication process to cease, but operation continues.
Additional Considerations Dual FC HBA Servers Running Microsoft Windows NT 4.0 / 2000 If a server with dual FC HBAs is connected to the SAN, zoning the FC fabric becomes mandatory, as shown in Figure 10. There are a minimum of four zones that need to be created: • A zone containing port 1 of the top controllers in each HSG80 controller pair, as well as one FC HBA from each dual FC HBA server. In the example, this is the "Red zone.
Additional Considerations Compaq recommends connecting each FC HBA of the dual FC HBA servers to a different switch, if possible. Cabling in this manner ensures that a loss of an FC switch does not result in a dual FC HBA server losing access to its storage. Compatibility with Other Operating Systems Servers running other operating systems, such as Compaq Tru64 UNIX, Compaq OpenVMS, or Sun Solaris, require the presence of at least two FC HBAs.
Conclusion Conclusion The functionality of single FC HBA servers in a DRM environment has been verified successfully. No changes to any of the software components are necessary. Single FC HBA server features are supported as follows: • Support for single-attached servers running Windows NT 4.0 and Windows 2000 in a DRM configuration. • Support for single-attached servers running Windows NT 4.0 and Windows 2000 in a single SAN with dual-attached servers in a DRM configuration.
Glossary 18 dual-redundant configuration A storage subsystem configuration that consists of two active controllers operating as a single controller. If one controller fails, the other assumes control of the failed controller’s devices. fabric A network of one or more Fibre Channel switches. failsafe locked The failsafe error mode can be enabled by the user to fail any write I/O whenever the target is inaccessible or the initiator unit fails.
Glossary latency The amount of time required for a transmission to reach its destination. link A connection between two adjacent Fibre Channel ports, consisting of a transmit fiber and a receive fiber. An example is the connection between the Fibre Channel switch port and the HSG80 controller. local site For subsystems using the disaster tolerant Data Replication Manager solution, the local site is the SAN that is the primary source of information.
Glossary SCSI Small Computer System Interface. A processor-independent, standard protocol for system-level interfacing between a computer and intelligent devices, including hard drives, floppy disks, CD-ROMs, printers, scanners, and others. single-mode fiber (SMF) Optical fiber that is designed for the transmission of a single ray or mode of light. Used for long distance signal transmission. SNMP trap Simple Network Management Protocol, an industry standard.