User guide

Programmers Model
ARM DDI 0337I Copyright © 2005-2008, 2010 ARM Limited. All rights reserved. 3-8
ID072410 Non-Confidential
3.3.2 Load/store timings
This section describes how best to pair instructions to achieve more reductions in timing.
STR Rx,[Ry,#imm]
is always one cycle. This is because the address generation is performed
in the initial cycle, and the data store is performed at the same time as the next instruction
is executing. If the store is to the write buffer, and the write buffer is full or not enabled,
the next instruction is delayed until the store can complete. If the store is not to the write
buffer, for example to the Code segment, and that transaction stalls, the impact on timing
is only felt if another load or store operation is executed before completion.
Any load with a base update is not normally pipelined. That is, base update load is
generally at least a two-cycle operation (more if stalled). However, if the next instruction
does not require to read from a register, the load is reduced to one cycle. Non register
writing instructions include
CMP
,
TST
,
NOP
, and non-taken
IT
controlled instructions.
LDR PC,[any]
is always a blocking operation. This means at least two cycles for the load,
and three cycles for the pipeline reload. So this operation takes at least five cycles, or more
if stalled on the load or the fetch.
Bit field Extract unsigned
UBFX Rd, Rn, #<imm>, #<imm>
1
Extract signed
SBFX Rd, Rn, #<imm>, #<imm>
1
Clear
BFC Rd, Rn, #<imm>, #<imm>
1
Insert
BFI Rd, Rn, #<imm>, #<imm>
1
Reverse Bytes in word
REV Rd, Rm
1
Bytes in both halfwords
REV16 Rd, Rm
1
Signed bottom halfword
REVSH Rd, Rm
1
Bits in word
RBIT Rd, Rm
1
Hint Send event
SEV
1
Wait for event
WFE
1 + W
Wait for interrupt
WFI
1 + W
No operation
NOP
1
Barriers Instruction synchronization
ISB
1 + B
Data memory
DMB
1 + B
Data synchronization
DSB <flags>
1 + B
a.
UMULL
,
SMULL
,
UMLAL
, and
SMLAL
instructions use early termination depending on the size of
the source values. These are interruptible, that is abandoned and restarted, with worst case
latency of one cycle.
b. Division operations use early termination to minimize the number of cycles required based
on the number of leading ones and zeroes in the input operands.
c. Neighboring load and store single instructions can pipeline their address and data phases.
This enables these instructions to complete in a single execution cycle.
d. Conditional branch completes in a single cycle if the branch is not taken.
e. An
IT
instruction can be folded onto a preceding 16-bit Thumb instruction, enabling
execution in zero cycles.
Table 3-1 Cortex-M3 instruction set summary (continued)
Operation Description Assembler Cycles