Specifications

ManualsBrandsAltera ManualsComputer equipmentNios II

131

132

133

134

135

136

137

138

139

140

The Nios II/f core also provides a hardware divide option that includes LE-based divide circuitry in the

ALU.

Including an ALU option improves the performance of one or more arithmetic instructions.

Note: The performance of the embedded multipliers differ, depending on the target FPGA family.

Table 5-3: Hardware Multiply and Divide Details for the Nios II/f Core

ALU Option Hardware Details Cycles per

Instruction

Result Latency

Cycles

Supported Instructions

No hardware

multiply or divide

Multiply and divide

instructions generate

an exception

– – None

Logic elements ALU includes 32 x 4-bit

multiplier

11 +2 mul, muli

DSP block on

Stratix III families

ALU includes 32 x 32-

bit multiplier

1 +2 mul, muli, mulxss,

mulxsu, mulxuu

Embedded

multipliers on

Cyclone III

families

ALU includes 32 x 16-

bit multiplier

5 +2 mul, muli

Hardware divide ALU includes

multicycle divide

circuit

4 – 66 +2 div, divu

The cycles per instruction value determines the maximum rate at which the ALU can dispatch instruc‐

tions and produce each result. The latency value determines when the result becomes available. If there is

no data dependency between the results and operands for back-to-back instructions, then the latency does

not affect throughput. However, if an instruction depends on the result of an earlier instruction, then the

processor stalls through any result latency cycles until the result is ready.

In the following code example, a multiply operation (with 1 instruction cycle and 2 result latency cycles) is

followed immediately by an add operation that uses the result of the multiply. On the Nios II/f core, the

addi instruction, like most ALU instructions, executes in a single cycle. However, in this code example,

execution of the addi instruction is delayed by two additional cycles until the multiply operation

completes.

mul r1, r2, r3 ; r1 = r2 * r3

addi r1, r1, 100 ; r1 = r1 + 100 (Depends on result of mul)

In contrast, the following code does not stall the processor.

mul r1, r2, r3 ; r1 = r2 * r3

or r5, r5, r6 ; No dependency on previous results

or r7, r7, r8 ; No dependency on previous results

addi r1, r1, 100 ; r1 = r1 + 100 (Depends on result of mul)

NII51015

2015.04.02

Multiply and Divide Performance

5-5

Nios II Core Implementation Details

Altera Corporation

Send Feedback