Datasheet

ManualsBrandsParallax ManualsEmbedded Processors & ControllersMicrocontroller DIL-40

Propeller™ P8X32A Datasheet www.parallax.com

4.8. Assembly Instruction Execution Stages

Figure 4: Assembly Instruction Execution Stages

Stage 1 2 3 4 5



(Execute N- 1)

Fetch

Instruction

Write

Result

N-1

Fetch

Source

Fetch

Destination

(Execute N)

Fetch

Instruction

N+1

Write

Result



clock cycle

M M+1 M+2 M+3 M+4 M+5

The Propeller executes assembly instructions in five

stages. While an entire instruction takes six cycles to

execute, two of those clock cycles are dedicated to the

two adjacent instructions. This results in an overall

throughput of four clock cycles per instruction.



Instruction N-1

Instruction N

Instruction N+1



Program

counter

Figure 5

Cog Memory



In Stage 1, instruction N, pointed to by the Program

Counter, is fetched from cog memory during clock cycle

M. During cycle M+1 the result from the previous

instruction is written to memory. The reason the previous

instruction result is written after the current instruction is

fetched will be explained shortly.

During Stage 2, if the immediate flag of Instruction N is

set, the 9 bit source field is saved as the source value. If

the value is not immediate, the location specified by the

source field is fetched from cog memory during clock

cycle M+2. During clock cycle M+3 the location

specified in the destination field is fetched from cog

memory (Stage 3).

At this point in time (Stage 4) the Arithmetic Logic Unit

(ALU) has all the information needed to execute the

instruction. Executing the instruction takes some amount

of time before the result is available. The amount of time

required for execution is dictated by the slowest operation

the ALU performs. To provide enough time for the ALU

to execute the instruction, a full clock cycle (M+4) is

provided to the ALU for the result to settle into its final

state. During this execution, the cog memory is not

accessed by instruction N. To speed up the throughput of

program execution, the next instruction to be executed is

fetched from cog memory while the current instruction is

executed in the ALU.

Finally at clock cycle M+5 the result of the current

instruction N is written back to cog memory, completing

Stage 5.

The partial interleaving of instructions has a couple

implications to program flow. First, when code

modification occurs through MOVI, MOVS, MOVD or any

operations which modifies an assembly instruction, there

must be at least one instruction executed before the

modified instruction is executed. If the modification is

done on the immediately following instruction (N+1), the

unmodified version of instruction N+1 will be loaded a

clock cycle before the modified version of instruction

N+1 is written to cog memory.

Second, conditional jumps do not know for certain if they

will jump until the end of clock cycle M+4. Since the next

instruction has already been fetched, only one of the two

possible branches can be predicted. In the Propeller,

conditional branches are always predicted to take the

jump. For loops using DJNZ where the jump is taken every

time except the final loop, a tighter execution time of the

loop is achieved.

In the event the jump is not taken, the cog takes no action

until the next instruction is fetched. This is equivalent to a

NOP being inserted before the next instruction is executed.

Unconditional jumps always take four clock cycles to

execute since the Propeller can always accurately predict

what address needs to be loaded into the Program Counter

for the next instruction to execute. Examples of

unconditional jumps include JMP, JMPRET, CALL and RET.

If an instruction needs to access any Hub resource, Stage

4 is extended until the Hub becomes available, increasing

execution time to at least 7 and potentially up to 22 clock

cycles. See Section 4.4: Hub on page 7.