Specifications

April 2012 v1 AMD Opteron™ 6200 Linux Tuning Guide

1.5 Floating Point Capabilities

Today’s server workloads require a broad mix of processor capabilities, from those using mostly integer

operations to those where ﬂoating point performance is paramount. The challenge for a general purpose

processor is to be fast and power efﬁcient at both of these extremes. The Flex FP is designed to support a wide

range of applications that vary greatly in the amount of ﬂoating point work needed while reducing the power for

those not needing much ﬂoating point.

The Flex FP supports the next generation AVX and FMA4 ﬂoating point instructions for both 128-bit and 256-bit

operands.

The new FMA4 instructions implement A = B * C + D in a single instruction rather than using two instructions

(an FMUL then an FADD). The FMA4 produces the result with lower latency than if an FMUL and an FADD were

used. For scientiﬁc applications, a compiler can replace the majority of FMUL and FADD instructions with a

single FMA4 instruction, reducing execution time and code size.

The ﬂoating point unit is capable of producing four double-precision FLOPS per cycle per clock cycle

simultaneously to each core in a pair for a total of eight per core pair per cycle. This is comparable to the ﬂoating

point performance per core per cycle of an AMD Opteron™ 6100 CPU. But, unlike prior CPUs, when one core is

issuing fewer ﬂoating point instructions, the other core in the pair can use its four FLOPS/cycle plus any unused

by the other core to fully exploit the capacity of the Flex FP. For example, in the extreme case of one core

executing no ﬂoating point instructions, the other core of the pair could achieve up to 8 double precision ﬂoating

point operations per cycle.