Specifications
Escala Tower PL & S, E, T System Hardware
Chapter 1: Family Overview 11/30
9es3s1c1.doc
Rev 5.9
02/12/2003
1.9. POWER3-II Architecture
The POWER3-II processor offers technical leadership for floating point applications and high-
performance numeric intensive computing (NIC) workstations by integrating two floating-point, three
fixed-point, and two load/store execution units in a single 64-bit POWER3 implementation.
The POWER3-II processors use a 64 KB data and a 32 KB instruction 128-way set associative L1
cache. The size of both data and instruction cache reduces the number of cache misses, results in more
cache hits, and maximizes performance. Both data and instruction cache are parity protected.
The L1 cache is effectively supplemented by a 4 MB 4-way set associative L2 cache, which is located
on the 375 MHz processor card (8 MB for the 450 MHz processor card). The speed of the L2 cache is
dependent upon the processor speed. The POWER3-II uses a private 32-byte L2 cache bus, operated at
250 MHz with the 375 MHz processor card (2:3 ratio) and 225 MHz with the 450 MHz processor card
(1:2 ratio). Both the enhanced clock speed and 4-way set associative L2 cache improve cache
efficiency. The L2 controller uses a least recently used (LRU) algorithm to avoid replacing recently
used cache data and a set prediction mechanism that helps reduce L2 cache misses.
The L2 cache uses a direct mapped cache methodology. There is a dedicated external interface to the
L2 cache not shared with the 6XX bus. This allows concurrent access to both the L2 cache and the
6XX bus.
1.10. POWER4 Architecture
The POWER4 chip has two processors on board. The two processors share a unified second level
cache, also onboard the chip, through a Core Interface Unit (CIU). The CIU is a crossbar switch
between the L2, implemented as three separate, autonomous cache controllers, and the two processors.
Each L2 cache controller can operate concurrently and feed 32 bytes of data per cycle. The CIU
connects each of the three L2 controllers to either the data cache or the instruction cache in either of
the two processors. Additionnally, the CIU accepts stores from the processors across 8-byte wide buses
and sequences them to the L2 controllers. Each processor has associated with it a Noncacheable (NC)
Unit, responsible for handling instruction serializing functions and performing any noncacheable
operations in the storage hierarchy. Logically, this is part of the L2.
The directory for a third level cache, L3, and logically its controller are also located on the POWER4
chip. The actual L3 is on a separate chip. A separate functional unit, referred to as the Fabric
Controller, is responsible for controlling data flow between the L2 and L3 controller for the chip and
for POWER4 communication. The GX controller is responsible for controlling the flow of information
in and out of the system. Typically, this would be the interface to an I/O drawer attached to the system.
But, with the POWER4 architecture, this is also where we would natively attach an interface to a
switch for clustering multiple POWER4 nodes together.
Also included on the chip are functions we logically call Pervasive function. These include trace and
debug facilities used for First Failure Data Capture, Builtin Self Test (BIST) facilities, Performance
Monitoring Unit, an interface to the Service Processor (SP) used to control the overall system, Power
On Reset (POR) Sequencing logic, and Error Detection and Logging circuitry.