Specifications

CD, DVD, BLU-RAY & PlayStation 3 Secrets
(C) www.cardan.nl pag: 24
Post Transform and Lighting Cache 63 max vertices
45 max vertices
Total Texture Cache Per Quad of Pixel Pipes (L1 and L2)
96kB 48kB
CPU interface FlexIO PCI-Express 16x
Technology 65nm/90nm 110nm
Other RSX features/differences include:
More shader instructions
Extra texture lookup logic (helps RSX transport data from XDR)
Fast vector normalize
Note that the cache (Post Transform and Lighting Vertext Cache) is located between the vector shader and the triangle setup.
A sample flow of data inside the RSX would see them first processed by 8 vertex shaders. The output are then sent to the 24
active pixel shaders, which can involve the 24 active texture units. Finally, the data is passed to the 8 Raster Operation Pipeline
units (ROPs), and on out to the GDDR3.
Note that the pixel shaders are grouped into groups of four (called Quads). There are 7 Quads, with 1 redundant, leaving 6
Quads active, which provides us with the 24 active pixel shaders listed above (6 times 4 equals 24). Since each Quad has 96kB
of L1 and L2 cache, the total RSX texture cache is 576kB. General RSX features include 2x and 4x hardware anti-aliasing, and
support for Shader Model 3.0.
Although the RSX has 256MB of GDDR3 RAM, not all of it is useable. The last 4MB is reserved for keeping track of the RSX
internal state and issued commands. The 4MB of GPU Data contains RAMIN, RAMHT, RAMFC, DMA Objects, Graphic Objects,
and the Graphic Context. The following is a breakdown of the address within 256MB of the RSX.
Address Range Size Comment
0000000-FBFFFFF
252 MB
Framebuffer
FC00000-FFFFFFF
4 MB GPU Data
FF80000-FFFFFFF
512KB RAMIN: Instance Memory
FF90000-FF93FFF
16KB RAMHT: Hash Table
FFA0000-FFA0FFF
4KB/s RAMFC: FIFO Context
FFC0000-FFCFFFF
64KB DMA Objects
FFD0000-FFDFFFF
64KB Graphic Objects
FFE0000-FFFFFFF
128KB GRAPH: Graphic Context
RSX Libraries
The RSX is dedicated to 3D graphics, and developers are able to use different API libraries to access its features. The easiest
way is to use high level PSGL, which is bascially OpenGL|ES with programmable pipeline added in. At a lower level developers
can use LibGCM, which is an API that talks to the RSX at a lower level.
PSGL is actually implemented on top of LibGCM. For the advanced programmer, you can program the RSX by sending
commands to it directly using C or assembly. This can be done by setting up commands (via FIFO Context) and DMA Objects
and issuing them to the RSX via DMA calls.
Speed, Bandwidth, and Latency
Because of the aforementioned layout of the communication path between the different chips, and the latency and bandwidth
differences between the various components, there are different access speeds depending on the direction of the access in
relation to the source and destination.
The following is a chart showing the speed of reads and writes to the GDDR3 and XDR memory from the viewpoint of the Cell
and RSX. Note that these are measured speeds (rather than calculated speeds) and they should be worse if RSX and GDDR3
access are involved because these figures were measured when the RSX was clocked at 550Mhz and the GDDR3 memory was
clocked at 700Mhz.
The shipped PS3 has the RSX clocked in at 500Mhz (front and back end, although the pixel shaders run separately inside at
550Mhz). In addition, the GDDR3 memory was also clocked lower at 650Mhz.
Processor
256MB XDR
256MB GDDR3
Cell Read
16.8GB/s 16MB/s
Cell Write
24.9GB/s 4GB/s
RSX Read
15.5GB/s 22.4GB/s
RSX Write
10.6GB/s 22.4GB/s