ccNUMA Overview

Introduction

ccNUMA (cache coherent Non-Uniform Memory Architecture) systems offer programmers and users the simplicity

and flexibility of symmetric multiprocessing (SMP) with the memory scalability of clusters. In a ccNUMA system,

processors, memory, and I/O are grouped together into cells. The latency and bandwidth characteristics of

communication within a cell are “fast,” while going outside a cell is “slow.” Since the memory in ccNUMA

systems is physically distributed but logically shared, these systems offer better performance to applications that

are optimized to use their features. For non-optimized applications, they still offer better performance since the

default behavior is designed to be benign—if not beneficial—and they still have access to much larger shared

resources of memory, CPUs, and disk space.

HP-UX has made use of ccNUMA technology in its newest, most scalable machines. The HP Integrity systems will

have ccNUMA support in HP-UX 11i v2, while the HP 9000 systems will have ccNUMA support in HP-UX 11i v3.

Executive summary

The newest, largest HP systems make use of a design called ccNUMA. In this design, processors, memory, and

I/O are grouped together into cells. The latency and bandwidth characteristics of memory within a cell are “fast,”

while accesses outside a cell are “slow.” HP-UX makes use of this design by attempting to keep the majority of

accesses local to the same cell. This is achieved through good default behavior and by providing interfaces for

ccNUMA-aware applications to make the best use of the architecture.

Problem statement

SMP systems cannot offer sufficient memory bandwidth for large numbers of processors without incurring

excessive penalties due to memory latency. ccNUMA architectures offer scalable memory bandwidth while

keeping memory latencies reasonable—often delivering latencies in the same class as much smaller systems.

It is intended that many applications can run unmodified and perform well, but in order to reach maximum

performance, important applications may need to be modified to exploit the new features of HP-UX 11i v2.

Introduction to ccNUMA

Physical memory characteristics

Physical memory refers to the actual physical arrangement and connection of the memory to the rest of the

computer system. Since applications running on HP-UX deal with the system in terms of virtual memory, knowing

about and optimizing applications with respect to physical memory is not a common concept in application

design.

In cell-based systems like high-end HP servers, memory is arranged in cells, often symmetrically throughout the

system. But it is not required that memory be symmetric, and due to budget constraints and failures, memory may

not be evenly distributed over all the cells. These systems can support any amount of memory up to 32 GB on a

cell.

Capacity is the quantity of memory that is available to an application.

Latency is the time it takes for a memory reference from a processor to be satisfied by the memory system. Often

measured in nanoseconds, this is typically the “load to use” latency. This quantity includes the time the processor

takes to put the request on the memory bus, and the time it takes to transfer the request to the memory controller

and deal with all the cache hierarchy, satisfy the request, and return the data to the register on the processor

requesting the data.

Bandwidth is the rate at which data is transferred to the processor or to the I/O subsystem.

Occupancy is the amount of time a memory image exists in memory. This is an issue because a lot of interesting

applications have very high degrees of occupancy, meaning they exist for long periods of time. This gives the

system opportunities to modify and optimize the performance of an application.