HP-UX 11i v3 Mass Storage I/O Performance Improvements

ManualsBrandsHP ManualsSoftwareHP-UX 11i v3 Enterprise OE

Figure 3 I/Os per second (two paths per LUN)

50,000

100,000

150,000

200,000

250,000

300,000

350,000

HP-UX 11i v2

153,000 147,000

HP-UX 11i v3

309,000 305,000

Reads Writes

IOPS (IOs per second)

I/Os performed on a set of legacy DSFs, all through a single HBA port.

I/Os performed on the same set of legacy DSFs. Native Multi-Pathing

automatically takes advantage of paths through both HBA ports.

2.02x

2.07x

11i v2

11i v3 11i v3

11i v2

Least-command-load versus round-robin

The HP-UX 11i v3 results in Figures 2 and 3 used the default load-balancing policy, round-robin.

The least-command-load policy generally has higher CPU utilization than round-robin, but this is

typically only significant in small I/O size workloads of 8K I/O size or less. The CPU utilization

difference increases somewhat as the I/O load (average number of I/O requests outstanding) per

LUN increases and as the number of paths increase.

Least-command-load can have an advantage on workloads with a mixture of I/O sizes that are in

progress simultaneously, or in configurations with significant variations in path performance. This is

because the least-command-load tends to balance the load better in the presence of such inequalities

than the round-robin approach. For example, a workload with a significant mixture of I/O requests of

different sizes (for example, 4K, 8K, and 16K) that tend to be in progress simultaneously might see

increased I/O operations per second with least-command-load. Similarly, a configuration similar to

Figure 1 in which HBA ports 1 and 2 are on 2 Gb/s HBAs and ports 3 and 4 are on 4 Gb/s HBAs

might benefit from least-command-load if the workload is sufficient to sustain I/O on multiple paths

simultaneously.

Neither least-command-load nor round-robin is recommended on cell-based machines when the paths

are spread across cells. Cell-local round robin must be used instead.

Decreased CPU utilization in cell-based systems

On a cell-based system, inter-cell overheads make the use of cell-local round-robin in HP-UX 11i v3 a

significant advantage even in a completely statically balanced configuration and workload. HP

performed tests using the Disk Bench I/O benchmark tool on a 16-core, 4-cell rx8640 server with

eight 4 Gb/s Fibre Channel adapter ports connected to MSA1000 Fibre Channel Disks (8

paths/LUN). Error! Reference source not found. compares IOPS on HP-UX 11i v2 with cell-local

round-robin on HP-UX 11i v3. In these HP tests on a cell-based server, HP-UX 11i v3 showed

significant IOPS improvements: 2.1 times the IOPS of 11i v2. This is due to the decreased CPU

utilization that results from better memory locality in the I/O path. In addition to the benefits of cell-

local round-robin, which is recommended on cell-based systems, a significant portion of the speed-up