User guide
Programmers Model
ARM DDI 0337I Copyright © 2005-2008, 2010 ARM Limited. All rights reserved. 3-9
ID072410 Non-Confidential
• Any load or store that generates an address dependent on the result of a proceeding data
processing operation will stall the pipeline for an additional cycle whilst the register bank
is updated. There is no forwarding path for this scenario.
•
LDR Rx,[PC,#imm]
might add a cycle because of contention with the fetch unit.
•
TBB
and
TBH
are also blocking operations. These are at least two cycles for the load, one
cycle for the add, and three cycles for the pipeline reload. This means at least six cycles,
or more if stalled on the load or the fetch.
•
LDR [any]
are pipelined when possible. This means that if the next instruction is an
LDR
or
STR
, and the destination of the first
LDR
is not used to compute the address for the next
instruction, then one cycle is removed from the cost of the next instruction. So, an
LDR
might be followed by an
STR
, so that the
STR
writes out what the
LDR
loaded. More multiple
LDR
s can be pipelined together. Some optimized examples are:
—
LDR R0,[R1]; LDR R1,[R2]
- normally three cycles total
—
LDR R0,[R1,R2]; STR R0,[R3,#20]
- normally three cycles total
—
LDR R0,[R1,R2]; STR R1,[R3,R2]
- normally three cycles total
—
LDR R0,[R1,R5]; LDR R1,[R2]; LDR R2,[R3,#4]
- normally four cycles total.
• Other instructions cannot be pipelined after
STR
with register offset.
STR
can only be
pipelined when it follows an
LDR
, but nothing can be pipelined after the store. Even a
stalled
STR
normally only takes two cycles, because of the write buffer.
•
LDREX
and
STREX
can be pipelined exactly as
LDR
. Because
STREX
is treated more like an
LDR
,
it can be pipelined as explained for
LDR
. Equally
LDREX
is treated exactly as an
LDR
and so
can be pipelined.
•
LDRD
and
STRD
cannot be pipelined with preceding or following instructions. However, the
two words are pipelined together. So, this operation requires three cycles when not stalled.
•
LDM
and
STM
cannot be pipelined with preceding or following instructions. However, all
elements after the first are pipelined together. So, a three element
LDM
takes 2+1+1 or 5
cycles when not stalled. Similarly, an eight element store takes nine cycles when not
stalled. When interrupted,
LDM
and
STM
instructions continue from where they left off when
returned to. The continue operation adds one or two cycles to the first element when
started.
• Unaligned word or halfword loads or stores add penalty cycles. A byte aligned halfword
load or store adds one extra cycle to perform the operation as two bytes. A halfword
aligned word load or store adds one extra cycle to perform the operation as two halfwords.
A byte-aligned word load or store adds two extra cycles to perform the operation as a byte,
a halfword, and a byte. These numbers increase if the memory stalls. A
STR
or
STRH
cannot
delay the processor because of the write buffer.
3.3.3 Binary compatibility with other Cortex processors
The processor implements a binary compatible subset of the instruction set and features
provided by other Cortex-M profile processors. You can move software, including system level
software, from the Cortex-M3 processor to other Cortex-M profile processors.
To ensure a smooth transition, ARM recommends that code designed to operate on other
Cortex-M profile processor architectures obey the following rules and configure the
Configuration and Control Register (CCR) appropriately:
• use word transfers only to access registers in the NVIC and System Control Space (SCS).
• treat all unused SCS registers and register fields on the processor as Do-Not-Modify.