Specifications

April 2012 v1 AMD Opteron™ 6200 Linux Tuning Guide

1.3 Shared Resources

AMD Opteron™ 6200 Series processors shared resources boost single core performance. The processor has

numerous features designed to boost the performance of single-core jobs and a job load that does not use

all the cores. This is a good thing but, unless understood, could be viewed as poor scaling. The goal is to run

single-core jobs faster and, where possible, reallocate resources and power to boost performance. Therefore,

you see a more powerful core in many dimensions. The following features can be shared and reallocated to

some degree.

• POWER AND FREQUENCY

- We can boost frequencies by allocating more power to the single core, subject to a total power limit and

other restrictions. Core pairs that are idle can “go to sleep” and turn off their power draw, allowing more

power to be dedicated to active cores.

- A single-core job could see a processor running at 3.2Ghz, while with all cores active would run at 2.3Ghz

(for top 16-core standard power AMD Opteron™ 6276 Series processors).

• FLOATING POINT UNIT

- Each core in a core pair shares the ﬂoating point unit. The ﬂoating point unit has two FMAC units, each able

to produce a 128-bit result each cycle.

• MEMORY BANDWIDTH

- All cores on a die are behind a single memory controller. As more cores are added, the available bandwidth

gets shared. Different applications have different requirements for memory bandwidth and will be affected

differently by how memory is shared by the cores.

• L2 CACHE

- Each core in a core pair shares the L2 cache.

• L3 CACHE

- All cores on a die share the L3 cache. There are two die in a package.

• INSTRUCTION FETCH AND DECODE CIRCUITRY

- Each core in a core pair shares instruction fetch and decode circuitry. This is generally invisible to program

performance.

• I/O AND INTERCONNECT BANDWIDTH

- At the board level, all cores share the I/O and interconnect bandwidth.

1.4 Dedicated Resources

Inside of each module are three dedicated schedulers, one for each integer core and one to feed the Flex FP.

The integer schedulers are 40-entry and the Flex FP scheduler is 60-entry. By having a dedicated scheduler

for each integer core, AMD’s new core architecture helps ensure that the four integer pipelines are being kept

continually ﬁlled with instruction for the highest efﬁciency.

Each integer core has control over its own scheduling so that there is no bottleneck between the two

dedicated threads that are executing in the module simultaneously. The Flex FP scheduler is a single entity

because in AVX mode it needs to be able to send a single stream of 256-bit AVX operations through the FP

pipes. In 128-bit mode, the extra entries in the Flex FP scheduler help ensure that the two 128-bit FMACs are

receiving a constant stream of math instructions to execute.