Locality-Optimized Resource Alignment

Executive summary

Locality-Optimized Resource Alignment (hereinafter LORA) is a framework for increasing system

performance by exploiting the locality domains in HP servers with Non-Uniform Memory Architecture.

LORA consists of configuration rules and tuning recommendations, plus HP-UX 11i v3 enhancements

and tools to support them.

LORA introduces a new mode to supplement the Symmetric Multiprocessing (SMP) mode originally

implemented in HP-UX. LORA exploits locality in NUMA platforms to advantage, while the SMP

approach treats the memory resources in a symmetric manner. For application workloads that exhibit

locality of memory reference, systems configured in accordance with LORA will typically see a 20%

performance improvement compared to the SMP mode used with interleaved memory.

The advanced power controls in HP servers offer the opportunity for great power savings when

platform hardware is not fully utilized. Because the power domains generally correspond to the

locality domains, LORA configurations naturally mesh with a power conservation strategy.

The body of this white paper contains sections describing background and motivation, scope,

configuration rules, and system administration recommendations. The technical details behind these

topics appear in the appendices.

LORA was first introduced in September 2008 with Update 3 to HP-UX 11i v3. Here are the major

improvements delivered in September 2009 with Update 5:

• The new parconfig command makes configuring nPartitions much simpler.

• The procedure for creating well-aligned vPars instances is simpler, and those instances are

fully compatible with gWLM dynamic processor migration operations.

• HP now recommends deploying Integrity Virtual Machines in LORA mode.

• LORA mode is now recommended for more application classes.

• There is less need for system administrators to perform explicit tuning, because HP-UX

implements heuristics to perform resource alignment automatically.

• There is a new command, loratune, to tune up resource alignment.

Background and motivation

Structure of HP servers

HP midrange and high-end servers are constructed as a complex of multiple modular units containing

the hardware processing resources. This structure yields great advantages: a single family of servers

can span the range from an economical 4 processor cores up to world-class performance 128

processor cores, with similar scaling in the amount of memory and number of I/O slots. Moreover,

the complex can be partitioned to support multiple independent and isolated application workloads,

with each partition sized to have the right amount of hardware resources for its workload.

A consequence of this structure is that the processing resources within the complex are grouped into a

set of localities. For any given processor core, memory access latency time depends on where that

memory is located. This is called Non-Uniform Memory Architecture (NUMA).

Interleaved memory

Interleaved memory (ILM) is a technique for masking the NUMA properties of a system. Successive

cache lines in the memory address space are drawn from different localities, making the average

memory access latency time more-or-less uniform.