System information
Resolve Path Thrashing
Use this procedure to resolve path thrashing. Path thrashing occurs on active-passive arrays when two hosts
access the LUN through different SPs and, as a result, the LUN is never actually available.
Procedure
1 Ensure that all hosts sharing the same set of LUNs on the active-passive arrays use the same storage
processor.
2 Correct any cabling inconsistencies between different ESX/ESXi hosts and SAN targets so that all HBAs
see the same targets in the same order.
3 Configure the path to use the Most Recently Used PSP (the default).
Understanding Path Thrashing
The SPs in a storage array are like independent computers that have access to some shared storage. Algorithms
determine how concurrent access is handled.
For active/passive arrays, all the sectors on the storage that make up a given LUN can be accessed by only one
SP at a time. The LUN ownership is passed around between the storage processors. The reason is that storage
arrays use caches and SP A must not write anything to disk that invalidates the SP B cache. Because the SP has
to flush the cache when it finishes the operation, it takes a little time to move the ownership. During that time,
no I/O to the LUN can be processed by either SP.
Some active/passive arrays attempt to look like active/active arrays by passing the ownership of the LUN to
the various SPs as I/O arrives. This approach works in a clustering setup, but if many ESX/ESXi systems access
the same LUN concurrently through different SPs, the result is path thrashing.
Consider how path selection works:
n
On an active/active array the ESX/ESXi system starts sending I/O down the new path.
n
On an active/passive arrays, the ESX/ESXi system checks all standby paths. The SP of the path that is
currently under consideration sends information to the system on whether it currently owns the LUN.
n
If the ESX/ESXi system finds an SP that owns the LUN, that path is selected and I/O is sent down that
path.
n
If the ESX/ESXi host cannot find such a path, the ESX/ESXi host picks one of the standby paths and
sends the SP of that path a command to move the LUN ownership to the SP.
Path thrashing can occur as a result of the following path choice: If server A can reach a LUN only through
one SP, and server B can reach the same LUN only through a different SP, they both continually cause the
ownership of the LUN to move between the two SPs, effectively ping-ponging the ownership of the LUN.
Because the system moves the ownership quickly, the storage array cannot process any I/O (or can process
only very little). As a result, any servers that depend on the LUN will experience low throughput due to
the long time it takes to complete each I/O request.
Equalize Disk Access Between Virtual Machines
You can adjust the maximum number of outstanding disk requests with the Disk.SchedNumReqOutstanding
parameter in the vSphere Client. When two or more virtual machines are accessing the same LUN, this
parameter controls the number of outstanding requests that each virtual machine can issue to the LUN.
Adjusting the limit can help equalize disk access between virtual machines.
This limit does not apply when only one virtual machine is active on a LUN. In that case, the bandwidth is
limited by the queue depth of the storage adapter.
Chapter 6 Managing ESX/ESXi Systems That Use SAN Storage
VMware, Inc. 69