Locality-Optimized Resource Alignment for Superdome 2

2
Executive summary
Locality-Optimized Resource Alignment (LORA) is a framework for increasing system performance by
exploiting the locality domains in HP servers with Non-Uniform Memory Architecture, such as
Superdome 2. LORA consists of configuration rules and tuning recommendations, plus HP-UX 11i v3
enhancements and tools to support them.
LORA introduces a new mode to supplement the Symmetric Multiprocessing (SMP) mode originally
implemented in HP-UX. LORA exploits the locality differences inherent in NUMA platforms to gain
performance efficiency, while the SMP approach treats the memory resources in a symmetric manner.
For application workloads that exhibit spatial locality in their memory reference pattern, systems
configured in accordance with LORA will typically see a 20% performance improvement compared to
the SMP mode used with interleaved memory.
LORA was first introduced in September 2008, with Update 3 to HP-UX 11i v3, and has been
enhanced and extended to work well with Superdome 2. This white paper applies to the first release
of the Superdome 2 platform in August 2010.
The body of this white paper contains sections describing background and motivation of the LORA
concept, configuration rules, and system administration recommendations. The technical details
behind these topics appear in the appendices.
Background and motivation of LORA
Structure of Superdome 2 servers
Superdome 2 servers are constructed from multiple modular units containing the hardware resources.
This structure yields great advantages: a single family of server configurations can span the range
from an economical 4 processor cores up to world-class performance 64 processor cores, with similar
scaling in the amount of memory and number of I/O slots. Moreover, Superdome 2 supports
nPartitions, meaning that the server complex can be divided into multiple independent and isolated
partitions, with each partition sized to have the right amount of hardware resources for its application
workload.
A consequence of this structure is that the hardware resources within a server complex are grouped
into a set of localities. For any given processor core, the memory access latency time depends on
where that memory is located. This is called Non-Uniform Memory Architecture (NUMA). For
Superdome 2 servers, the localities are oriented around sockets.
The diagram in Figure 1 shows the structure of a Superdome 2 enclosure at the conceptual level. The
diagram illustrates these key points:
Blades and I/O Expansion Enclosures are the physical units that build up the server complex.
Each blade contains two sockets. Each socket contains four processor cores and is associated with
a set of DIMMs that hold its local memory.
Sockets are the units of locality for processor cores and memory. The cores and memory on a
socket communicate through a fast local path. The two sockets on a blade communicate through a
local interconnect. The interconnect fabric allows blades in the same nPartition to communicate
with each other.
I/O Expansion Enclosures are attached to the interconnect fabric as peers of the blades.