Specifications
April 2012 v1 AMD Opteron™ 6200 Linux Tuning Guide
6
1.3 Shared Resources
AMD Opteron™ 6200 Series processors shared resources boost single core performance. The processor has
numerous features designed to boost the performance of single-core jobs and a job load that does not use
all the cores. This is a good thing but, unless understood, could be viewed as poor scaling. The goal is to run
single-core jobs faster and, where possible, reallocate resources and power to boost performance. Therefore,
you see a more powerful core in many dimensions. The following features can be shared and reallocated to
some degree.
• POWER AND FREQUENCY
- We can boost frequencies by allocating more power to the single core, subject to a total power limit and
other restrictions. Core pairs that are idle can “go to sleep” and turn off their power draw, allowing more
power to be dedicated to active cores.
- A single-core job could see a processor running at 3.2Ghz, while with all cores active would run at 2.3Ghz
(for top 16-core standard power AMD Opteron™ 6276 Series processors).
• FLOATING POINT UNIT
- Each core in a core pair shares the floating point unit. The floating point unit has two FMAC units, each able
to produce a 128-bit result each cycle.
• MEMORY BANDWIDTH
- All cores on a die are behind a single memory controller. As more cores are added, the available bandwidth
gets shared. Different applications have different requirements for memory bandwidth and will be affected
differently by how memory is shared by the cores.
• L2 CACHE
- Each core in a core pair shares the L2 cache.
• L3 CACHE
- All cores on a die share the L3 cache. There are two die in a package.
• INSTRUCTION FETCH AND DECODE CIRCUITRY
- Each core in a core pair shares instruction fetch and decode circuitry. This is generally invisible to program
performance.
• I/O AND INTERCONNECT BANDWIDTH
- At the board level, all cores share the I/O and interconnect bandwidth.
1.4 Dedicated Resources
Inside of each module are three dedicated schedulers, one for each integer core and one to feed the Flex FP.
The integer schedulers are 40-entry and the Flex FP scheduler is 60-entry. By having a dedicated scheduler
for each integer core, AMD’s new core architecture helps ensure that the four integer pipelines are being kept
continually filled with instruction for the highest efficiency.
Each integer core has control over its own scheduling so that there is no bottleneck between the two
dedicated threads that are executing in the module simultaneously. The Flex FP scheduler is a single entity
because in AVX mode it needs to be able to send a single stream of 256-bit AVX operations through the FP
pipes. In 128-bit mode, the extra entries in the Flex FP scheduler help ensure that the two 128-bit FMACs are
receiving a constant stream of math instructions to execute.