Specifications

April 2012 v1 AMD Opteron™ 6200 Linux Tuning Guide
7
1.5 Floating Point Capabilities
Today’s server workloads require a broad mix of processor capabilities, from those using mostly integer
operations to those where floating point performance is paramount. The challenge for a general purpose
processor is to be fast and power efficient at both of these extremes. The Flex FP is designed to support a wide
range of applications that vary greatly in the amount of floating point work needed while reducing the power for
those not needing much floating point.
The Flex FP supports the next generation AVX and FMA4 floating point instructions for both 128-bit and 256-bit
operands.
The new FMA4 instructions implement A = B * C + D in a single instruction rather than using two instructions
(an FMUL then an FADD). The FMA4 produces the result with lower latency than if an FMUL and an FADD were
used. For scientific applications, a compiler can replace the majority of FMUL and FADD instructions with a
single FMA4 instruction, reducing execution time and code size.
The floating point unit is capable of producing four double-precision FLOPS per cycle per clock cycle
simultaneously to each core in a pair for a total of eight per core pair per cycle. This is comparable to the floating
point performance per core per cycle of an AMD Opteron™ 6100 CPU. But, unlike prior CPUs, when one core is
issuing fewer floating point instructions, the other core in the pair can use its four FLOPS/cycle plus any unused
by the other core to fully exploit the capacity of the Flex FP. For example, in the extreme case of one core
executing no floating point instructions, the other core of the pair could achieve up to 8 double precision floating
point operations per cycle.