Locality-Optimized Resource Alignment for Superdome 2

Table Of Contents

Executive summary

Locality-Optimized Resource Alignment (LORA) is a framework for increasing system performance by

exploiting the locality domains in HP servers with Non-Uniform Memory Architecture, such as

Superdome 2. LORA consists of configuration rules and tuning recommendations, plus HP-UX 11i v3

enhancements and tools to support them.

LORA introduces a new mode to supplement the Symmetric Multiprocessing (SMP) mode originally

implemented in HP-UX. LORA exploits the locality differences inherent in NUMA platforms to gain

performance efficiency, while the SMP approach treats the memory resources in a symmetric manner.

For application workloads that exhibit spatial locality in their memory reference pattern, systems

configured in accordance with LORA will typically see a 20% performance improvement compared to

the SMP mode used with interleaved memory.

LORA was first introduced in September 2008, with Update 3 to HP-UX 11i v3, and has been

enhanced and extended to work well with Superdome 2. This white paper applies to the first release

of the Superdome 2 platform in August 2010.

The body of this white paper contains sections describing background and motivation of the LORA

concept, configuration rules, and system administration recommendations. The technical details

behind these topics appear in the appendices.

Background and motivation of LORA

Structure of Superdome 2 servers

Superdome 2 servers are constructed from multiple modular units containing the hardware resources.

This structure yields great advantages: a single family of server configurations can span the range

from an economical 4 processor cores up to world-class performance 64 processor cores, with similar

scaling in the amount of memory and number of I/O slots. Moreover, Superdome 2 supports

nPartitions, meaning that the server complex can be divided into multiple independent and isolated

partitions, with each partition sized to have the right amount of hardware resources for its application

workload.

A consequence of this structure is that the hardware resources within a server complex are grouped

into a set of localities. For any given processor core, the memory access latency time depends on

where that memory is located. This is called Non-Uniform Memory Architecture (NUMA). For

Superdome 2 servers, the localities are oriented around sockets.

The diagram in Figure 1 shows the structure of a Superdome 2 enclosure at the conceptual level. The

diagram illustrates these key points:

• Blades and I/O Expansion Enclosures are the physical units that build up the server complex.

• Each blade contains two sockets. Each socket contains four processor cores and is associated with

a set of DIMMs that hold its local memory.

• Sockets are the units of locality for processor cores and memory. The cores and memory on a

socket communicate through a fast local path. The two sockets on a blade communicate through a

local interconnect. The interconnect fabric allows blades in the same nPartition to communicate

with each other.

• I/O Expansion Enclosures are attached to the interconnect fabric as peers of the blades.