User guide

ManualsBrandsARM ManualsComputer equipmentCortex-M3

Programmers Model

ID072410 Non-Confidential

3.3.2 Load/store timings

This section describes how best to pair instructions to achieve more reductions in timing.

•

STR Rx,[Ry,#imm]

is always one cycle. This is because the address generation is performed

in the initial cycle, and the data store is performed at the same time as the next instruction

is executing. If the store is to the write buffer, and the write buffer is full or not enabled,

the next instruction is delayed until the store can complete. If the store is not to the write

buffer, for example to the Code segment, and that transaction stalls, the impact on timing

is only felt if another load or store operation is executed before completion.

• Any load with a base update is not normally pipelined. That is, base update load is

generally at least a two-cycle operation (more if stalled). However, if the next instruction

does not require to read from a register, the load is reduced to one cycle. Non register

writing instructions include

CMP

TST

NOP

, and non-taken

controlled instructions.

•

LDR PC,[any]

is always a blocking operation. This means at least two cycles for the load,

and three cycles for the pipeline reload. So this operation takes at least five cycles, or more

if stalled on the load or the fetch.

Bit field Extract unsigned

UBFX Rd, Rn, #<imm>, #<imm>

Extract signed

SBFX Rd, Rn, #<imm>, #<imm>

Clear

BFC Rd, Rn, #<imm>, #<imm>

Insert

BFI Rd, Rn, #<imm>, #<imm>

Reverse Bytes in word

REV Rd, Rm

Bytes in both halfwords

REV16 Rd, Rm

Signed bottom halfword

REVSH Rd, Rm

Bits in word

RBIT Rd, Rm

Hint Send event

SEV

Wait for event

WFE

1 + W

Wait for interrupt

WFI

1 + W

No operation

NOP

Barriers Instruction synchronization

ISB

1 + B

Data memory

DMB

1 + B

Data synchronization

DSB <flags>

1 + B

UMULL

SMULL

UMLAL

, and

SMLAL

instructions use early termination depending on the size of

the source values. These are interruptible, that is abandoned and restarted, with worst case

latency of one cycle.

b. Division operations use early termination to minimize the number of cycles required based

on the number of leading ones and zeroes in the input operands.

c. Neighboring load and store single instructions can pipeline their address and data phases.

This enables these instructions to complete in a single execution cycle.

d. Conditional branch completes in a single cycle if the branch is not taken.

e. An

instruction can be folded onto a preceding 16-bit Thumb instruction, enabling

execution in zero cycles.

Table 3-1 Cortex-M3 instruction set summary (continued)

Operation Description Assembler Cycles