Datasheet

Propeller™ P8X32A Datasheet www.parallax.com
Copyright © Parallax Inc. Page 9 of 37 Rev 1.1 9/12/2008
4.8. Assembly Instruction Execution Stages
Figure 4: Assembly Instruction Execution Stages
Stage 1 2 3 4 5
(Execute N- 1)
Fetch
Instruction
N
Write
Result
N-1
Fetch
Source
N
Fetch
Destination
N
(Execute N)
Fetch
Instruction
N+1
Write
Result
N
clock cycle
M M+1 M+2 M+3 M+4 M+5
The Propeller executes assembly instructions in five
stages. While an entire instruction takes six cycles to
execute, two of those clock cycles are dedicated to the
two adjacent instructions. This results in an overall
throughput of four clock cycles per instruction.
Instruction N-1
Instruction N
Instruction N+1
Program
counter
Figure 5
Cog Memory
In Stage 1, instruction N, pointed to by the Program
Counter, is fetched from cog memory during clock cycle
M. During cycle M+1 the result from the previous
instruction is written to memory. The reason the previous
instruction result is written after the current instruction is
fetched will be explained shortly.
During Stage 2, if the immediate flag of Instruction N is
set, the 9 bit source field is saved as the source value. If
the value is not immediate, the location specified by the
source field is fetched from cog memory during clock
cycle M+2. During clock cycle M+3 the location
specified in the destination field is fetched from cog
memory (Stage 3).
At this point in time (Stage 4) the Arithmetic Logic Unit
(ALU) has all the information needed to execute the
instruction. Executing the instruction takes some amount
of time before the result is available. The amount of time
required for execution is dictated by the slowest operation
the ALU performs. To provide enough time for the ALU
to execute the instruction, a full clock cycle (M+4) is
provided to the ALU for the result to settle into its final
state. During this execution, the cog memory is not
accessed by instruction N. To speed up the throughput of
program execution, the next instruction to be executed is
fetched from cog memory while the current instruction is
executed in the ALU.
Finally at clock cycle M+5 the result of the current
instruction N is written back to cog memory, completing
Stage 5.
The partial interleaving of instructions has a couple
implications to program flow. First, when code
modification occurs through MOVI, MOVS, MOVD or any
operations which modifies an assembly instruction, there
must be at least one instruction executed before the
modified instruction is executed. If the modification is
done on the immediately following instruction (N+1), the
unmodified version of instruction N+1 will be loaded a
clock cycle before the modified version of instruction
N+1 is written to cog memory.
Second, conditional jumps do not know for certain if they
will jump until the end of clock cycle M+4. Since the next
instruction has already been fetched, only one of the two
possible branches can be predicted. In the Propeller,
conditional branches are always predicted to take the
jump. For loops using DJNZ where the jump is taken every
time except the final loop, a tighter execution time of the
loop is achieved.
In the event the jump is not taken, the cog takes no action
until the next instruction is fetched. This is equivalent to a
NOP being inserted before the next instruction is executed.
Unconditional jumps always take four clock cycles to
execute since the Propeller can always accurately predict
what address needs to be loaded into the Program Counter
for the next instruction to execute. Examples of
unconditional jumps include JMP, JMPRET, CALL and RET.
If an instruction needs to access any Hub resource, Stage
4 is extended until the Hub becomes available, increasing
execution time to at least 7 and potentially up to 22 clock
cycles. See Section 4.4: Hub on page 7.