Locality-Optimized Resource Alignment
3
Executive summary
Locality-Optimized Resource Alignment (hereinafter LORA) is a framework for increasing system
performance by exploiting the locality domains in HP servers with Non-Uniform Memory Architecture.
LORA consists of configuration rules and tuning recommendations, plus HP-UX 11i v3 enhancements
and tools to support them.
LORA introduces a new mode to supplement the Symmetric Multiprocessing (SMP) mode originally
implemented in HP-UX. LORA exploits locality in NUMA platforms to advantage, while the SMP
approach treats the memory resources in a symmetric manner. For application workloads that exhibit
locality of memory reference, systems configured in accordance with LORA will typically see a 20%
performance improvement compared to the SMP mode used with interleaved memory.
The advanced power controls in HP servers offer the opportunity for great power savings when
platform hardware is not fully utilized. Because the power domains generally correspond to the
locality domains, LORA configurations naturally mesh with a power conservation strategy.
The body of this white paper contains sections describing background and motivation, scope,
configuration rules, and system administration recommendations. The technical details behind these
topics appear in the appendices.
LORA was first introduced in September 2008 with Update 3 to HP-UX 11i v3. Here are the major
improvements delivered in September 2009 with Update 5:
• The new parconfig command makes configuring nPartitions much simpler.
• The procedure for creating well-aligned vPars instances is simpler, and those instances are
fully compatible with gWLM dynamic processor migration operations.
• HP now recommends deploying Integrity Virtual Machines in LORA mode.
• LORA mode is now recommended for more application classes.
• There is less need for system administrators to perform explicit tuning, because HP-UX
implements heuristics to perform resource alignment automatically.
• There is a new command, loratune, to tune up resource alignment.
Background and motivation
Structure of HP servers
HP midrange and high-end servers are constructed as a complex of multiple modular units containing
the hardware processing resources. This structure yields great advantages: a single family of servers
can span the range from an economical 4 processor cores up to world-class performance 128
processor cores, with similar scaling in the amount of memory and number of I/O slots. Moreover,
the complex can be partitioned to support multiple independent and isolated application workloads,
with each partition sized to have the right amount of hardware resources for its workload.
A consequence of this structure is that the processing resources within the complex are grouped into a
set of localities. For any given processor core, memory access latency time depends on where that
memory is located. This is called Non-Uniform Memory Architecture (NUMA).
Interleaved memory
Interleaved memory (ILM) is a technique for masking the NUMA properties of a system. Successive
cache lines in the memory address space are drawn from different localities, making the average
memory access latency time more-or-less uniform.