Specifications

www.ibm.com/support/techdocs 48

Summary of Best Practices for Storage Area Networks

There is an additional complication with certain types of applications, such as

backup solutions. In these cases, the application vendor will also produce a matrix

of tested, supported hardware.

10.4 Separation of Production and Test Segments

A test environment is useful in many instances. A key factor for this testbed is that it

should contain at least one of the production environment’s critical devices. It is

impossible for any hardware vendor to test every combination of hardware and

software, much less extend test scenarios to include application software. It can be

very helpful to use the testbed to check new code releases against applications

known by the staff to be particularly troublesome. The testbed can be as small as a

single server with a pair of HBAs mapped to some old, slow, SAN storage and a

cast-off switch. Such a setup would be adequate to perform some internal tests on a

new failover driver.

Conversely, the test environment can be as elaborate as a miniature recreation of

the whole SAN system, including a dedicated storage system mapped from each

brand and/or model of installed storage controller, similar, albeit smaller, versions of

the switches used in the production environment, and servers “beefy” enough to run

application simulations. Such a setup can be used for application testing and

development, as well as general testing. Many shops give their internal application

development teams such setups, and use it as a “sandbox” for planned changes to

their production environments.

Why is such a test environment necessary? There are many obvious answers to

this question. One such example is multi-path software. Multipath software is pivotal

to ensuring servers maintain redundant paths to storage. Without properly

functioning multipathing, the dual physical SAN infrastructure is somewhat less

useful. IBM designs and performs test plans to ensure that during a given path

failure, no errors are reported by the operating system. However, the failover code

stream may take a few moments to verify that a path is truly down by attempting

some number of IO retries before failing the IO stream to a different path. For this

reason, it is not at all uncommon for some applications to suffer from IO timeouts

before any OS timer “fires” and issues appropriate error messages. Certainly some

application testing takes place, but it is simply impossible for IBM to test all

scenarios based on hardware and application family.