HP-UX 11i v3 Mass Storage I/O Performance Improvements
is also due to better cache-locality algorithms in the new mass storage stack and the rest of the HP-UX
11i v3 kernel.
The cell-local round robin policy requires at least one LUN path per cell for optimal performance.
When this requirement is met, the cell-local round-robin can significantly improves the overall I/O
performance while decreasing the CPU overhead and scaling well with the number of cells.
Figure 4 I/Os per second, 8 paths/LUN, Cell-Local Round-robin
210,000
450,000
CPU Utilization, 55%
CPU Utilization, 42%
0
50,000
100,000
150,000
200,000
250,000
300,000
350,000
400,000
450,000
500,000
HP-UX 11i v2 HP-UX 11i v3
0%
20%
40%
60%
80%
100%
120%
140%
160%
180%
200%
1KB Read IOPS
CPU Utilization
2.1x
improvement
in IOPS
1.3x
improvement
in cpu util
CPU Utilization %
I/Os per Second (IOPS)
Note
Figure 4, which used a statically balanced workload, cannot be compared
with Figure 3, which used a statically unbalanced workload, and had a
number of other differences in the servers and mass storage configurations.
In a statically unbalanced workload, the improvement from HP-UX 11i v2 to
11i v3 in Figure 4 might have been much larger.
When using the cell-local round robin policy, the Mass Storage stack looks at the locality value of the
CPU initiating the specific I/O request and attempts to find a LUN path to the specific disk device with
exactly the same locality value. If suitable LUN paths with the same locality value are found, only
these are used. Otherwise, all available paths are used, much like the standard round robin policy.
The policy also has the following characteristics:
• Typically, the interface driver data structures reside in the cell local memory (if it is configured on
the cell based machine). The actual data that needs to be transferred between the device and
system memory is typically located in the file cache, which typically resides in interleaved memory
(ILM). The cell-local round robin scheduling policy tries to minimize the memory access latency only
for these interface driver data structures and not the interleaved memory access, which is required
to do the actual data transfer for an I/O.
• If an I/O issued from a process is tied to a particular CPU, using the cell-local round robin policy
limits the mass storage stack to use only a subset of LUN paths to a LUN that have the same locality
value as the CPU on which the process runs.
6