HP-UX 11i v3 Mass Storage I/O Performance Improvements

Figure 3 I/Os per second (two paths per LUN)
0
50,000
100,000
150,000
200,000
250,000
300,000
350,000
HP-UX 11i v2
153,000 147,000
HP-UX 11i v3
309,000 305,000
Reads Writes
IOPS (IOs per second)
I/Os performed on a set of legacy DSFs, all through a single HBA port.
I/Os performed on the same set of legacy DSFs. Native Multi-Pathing
automatically takes advantage of paths through both HBA ports.
2.02x
2.07x
11i v2
11i v3 11i v3
11i v2
Least-command-load versus round-robin
The HP-UX 11i v3 results in Figures 2 and 3 used the default load-balancing policy, round-robin.
The least-command-load policy generally has higher CPU utilization than round-robin, but this is
typically only significant in small I/O size workloads of 8K I/O size or less. The CPU utilization
difference increases somewhat as the I/O load (average number of I/O requests outstanding) per
LUN increases and as the number of paths increase.
Least-command-load can have an advantage on workloads with a mixture of I/O sizes that are in
progress simultaneously, or in configurations with significant variations in path performance. This is
because the least-command-load tends to balance the load better in the presence of such inequalities
than the round-robin approach. For example, a workload with a significant mixture of I/O requests of
different sizes (for example, 4K, 8K, and 16K) that tend to be in progress simultaneously might see
increased I/O operations per second with least-command-load. Similarly, a configuration similar to
Figure 1 in which HBA ports 1 and 2 are on 2 Gb/s HBAs and ports 3 and 4 are on 4 Gb/s HBAs
might benefit from least-command-load if the workload is sufficient to sustain I/O on multiple paths
simultaneously.
Neither least-command-load nor round-robin is recommended on cell-based machines when the paths
are spread across cells. Cell-local round robin must be used instead.
Decreased CPU utilization in cell-based systems
On a cell-based system, inter-cell overheads make the use of cell-local round-robin in HP-UX 11i v3 a
significant advantage even in a completely statically balanced configuration and workload. HP
performed tests using the Disk Bench I/O benchmark tool on a 16-core, 4-cell rx8640 server with
eight 4 Gb/s Fibre Channel adapter ports connected to MSA1000 Fibre Channel Disks (8
paths/LUN). Error! Reference source not found. compares IOPS on HP-UX 11i v2 with cell-local
round-robin on HP-UX 11i v3. In these HP tests on a cell-based server, HP-UX 11i v3 showed
significant IOPS improvements: 2.1 times the IOPS of 11i v2. This is due to the decreased CPU
utilization that results from better memory locality in the I/O path. In addition to the benefits of cell-
local round-robin, which is recommended on cell-based systems, a significant portion of the speed-up
5