Datasheet
PIC32MZ Embedded Connectivity with Floating Point Unit (EF) Family
DS60001320B-page 48 Preliminary 2015 Microchip Technology Inc.
3.1.4 FLOATING POINT UNIT (FPU)
The Floating Point Unit (FPU), Coprocessor (CP1),
implements the MIPS Instruction Set Architecture for
floating point computation. The implementation sup
-
ports the ANSI/IEEE Standard 754 (IEEE for Binary
Floating Point Arithmetic) for single- and double-preci
-
sion data formats. The FPU can be programmed to
have thirty-two 32-bit or 64-bit floating point registers
used for floating point operations.
The performance is optimized for single precision for-
mats. Most instructions have one FPU cycle throughput
and four FPU cycle latency. The FPU implements the
multiply-add (MADD) and multiply-sub (MSUB) instruc-
tions with intermediate rounding after the multiply func-
tion. The result is guaranteed to be the same as
executing a MUL and an ADD instruction separately,
but the instruction latency, instruction fetch, dispatch
bandwidth, and the total number of register accesses
are improved.
IEEE denormalized input operands and results are
supported by hardware for some instructions. IEEE
denormalized results are not supported by hardware in
general, but a fast flush-to-zero mode is provided to
optimize performance. The fast flush-to-zero mode is
enabled through the FCCR register, and use of this
mode is recommended for best performance when
denormalized results are generated.
The FPU has a separate pipeline for floating point
instruction execution. This pipeline operates in parallel
with the integer core pipeline and does not stall when
the integer pipeline stalls. This allows long-running
FPU operations, such as divide or square root, to be
partially masked by system stalls and/or other integer
unit instructions. Arithmetic instructions are always
dispatched and completed in order, but loads and
stores can complete out of order. The exception model
is “precise” at all times.
Table 3-4 contains the floating point instruction laten-
cies and repeat rates for the processor core. In this
table, 'Latency' refers to the number of FPU cycles nec
-
essary for the first instruction to produce the result
needed by the second instruction. The “Repeat Rate”
refers to the maximum rate at which an instruction can
be executed per FPU cycle.
TABLE 3-4: FPU INSTRUCTION
LATENCIES AND REPEAT
RATES
Op code
Latency
(FPU
Cycles)
Repeat
Rate
(FPU
Cycles)
ABS.[S,D], NEG.[S,D],
ADD.[S,D], SUB.[S,D],
C.cond.[S,D], MUL.S
4 1
MADD.S, MSUB.S,
NMADD.S, NMSUB.S,
CABS.cond.[S,D]
4 1
CVT.D.S, CVT.PS.PW,
CVT.[S,D].[W,L]
4 1
CVT.S.D,
CVT.[W,L].[S,D],
CEIL.[W,L].[S,D],
FLOOR.[W,L].[S,D],
ROUND.[W,L].[S,D],
TRUNC.[W,L].[S,D]
4 1
MOV.[S,D], MOVF.[S,D],
MOVN.[S,D],
MOVT.[S,D], MOVZ.[S,D]
4 1
MUL.D 5 2
MADD.D, MSUB.D,
NMADD.D, NMSUB.D
5 2
RECIP.S 13 10
RECIP.D 26 21
RSQRT.S 17 14
RSQRT.D 36 31
DIV.S, SQRT.S 17 14
DIV.D, SQRT.D 32 29
MTC1, DMTC1, LWC1,
LDC1, LDXC1, LUXC1,
LWXC1
4 1
MFC1, DMFC1, SWC1,
SDC1, SDXC1, SUXC1,
SWXC1
1 1
Legend: S = Single D = Double
W = Word L = Long word